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Intel Corporation Is a leading supplier of microcomputer components, 

modules and systems. When Intel first introduced the microprocessor in 1977, 

it created the era of the microcomputer. Today, Intel architectures are considered 
world standards. Intel products are used in a wide variety of applications including, 
embedded systems such as automobiles, avionics systems and telecommunications 
equipment, and as the CPU in personal computers, network servers and 
supercomputers. Others bring enhanced capabilities to systems and networks. 
Intel's mission is to deliver quality products through leading-edge technology. 
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_ Intel Corporation fvanes no warranty for the use of its products and assumes no responsibility for any errors — 
which may appear in ne document nor does it make a commitment to update the information contained 
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INTEL SERVICE 


INTEL’S COMPLETE SUPPORT SOLUTION WORLDWIDE 


Intel Service is a complete support program that provides Intel customers with hardware support, software 
support, customer training, and consulting services. For detailed information contact your local sales offices. 


Service and support are major factors in determining the success of a product or program. For Intel this 
support includes an international service organization and a breadth of service programs to meet a variety of 
customer needs. As you might expect, Intel service is extensive. It can start with On-Site Installation and 
Maintenance for Intel and non-Intel: systems and peripherals, Repair Services for Intel OEM Modules and 
Platforms, Network Operating System support for Novell NetWare and Banyan VINES software, Custom 
Integration Services for Intel Platforms, Customer Training, and ‘System Engineering Consulting Services. Intel 
maintains service locations worldwide. So wherever you’re using Intel technology, our professional staff is 
within close reach. | | 


ON-SITE INSTALLATION AND MAINTENANCE 


Intel’s installation and maintenance services are designed to get Intel and Intel-based systems and the net- 
works they use up and running—fast. Intel’s service centers are staffed by trained and certified Customer 
Engineers throughout the world. Once installed, Intel is dedicated to keeping them running at maximum 
efficiency, while controlling costs. 


REPAIR SERVICES FOR INTEL OEM / MODULES AND PLATFORMS 


Intel offers customers of its OEM Modules and Platforms a comprehensive set of repair services that reduce 
the costs of system warranty, maintenance, and OHTISUD: Repair services include module or system testing 
and repair, module exchange, and spare part sales. | | 


~ NETWORK OPERATING SYSTEM SUPPORT 


An Intel software support contract for Novell NetWare or Banyan VINES software means unlimited access to 
troubleshooting expertise any time during contract hours—up to seven days per week, twenty-four hours per 
day. To keep networks current and compatible with the latest software versions, support services include access 
to minor releases and “patches” as made available by Novell and Banyan. 


CUSTOM SYSTEM INTEGRATION SERVICES 


Intel Custom System Integration Services enable resellers to order completely integrated systems assembled 
from a list of Intel386™ and Intel486™ microcomputers and validated hardware and software options. These 
services are designed to complement the reseller’s own integration capabilities. Resellers can increase business 
opportunities, while controlling overhead and support costs. : 


CUSTOMER TRAINING 


Intel offers a wide range of instructional programs covering various aspects of system design and implementa- 
tion. In just three to five days a limited number of individuals learn more in a single workshop than in weeks of 
_ self-study. Covering a wide variety of topics, Intel’s major course categories include: architecture and assembly 
language, programming and operating systems, BITBUS™, and LAN applications. 


SYSTEM ENGINEERING CONSULTING 


Intel provides field system engineering consulting services for any phase of your development or application 
effort. You can use our system engineers in a variety of ways ranging from assistance in using a new product, 
- developing an application, personalizing training and customizing an Intel product to providing technical and 
management consulting. Working together, we can help you get a successful product to market in the least 
possible time. 
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DATA SHEET DESIGNATIONS 


Intel uses various data sheet markings to. designate each phase of the document as it 
relates to the product. The marking appears in the upper, right-hand corner of the data 
sheet. The following | is the definition of these tal : Ke HR Bye GE SS 


‘Data Sheet Marking - | se Be fess Hide acs ae - Description. sae) 


Product Preview ge eS Contains information on products in the design phase of 
; a+ development. Do not finalize .a design with this 
information. Revised information will be published when 

the product becomes available. 


Advanced Information Contains, information on products being. sampled or in 

a | the initial production phase of development.” : 

Preliminary © | Contains preliminary . information on new products in. 
oe production.* Crna ee ee 

No Marking . ....._ ... Contains eae products in full production 


‘Specifications. within ae data sheets are  euibjent to chanie without notice.’ Verify with your ‘local Intel sales 
office that you have the latest data sheet before finalizing a design. | 
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82750DB 


DISPLAY PROCESSOR 


m Programmable Video Timing 


— 28 MHz and 45MHz Operating Frequency 


— Pixel/Line Address Range to 4096 

— Fully Programmable Sync, 
Equalization, and Serration 
Components 

— Fully Programmable Blanking and 
Active Display Start and Stop Times 

— Genlocking Capability — 


m Flexible Display Characteristics _ 
— 8-, Pseudo 16-, 16-, and 32-Bit/Pixel 
Modes 
— Selectable Pixel Widths of 1. 0, 4:5, 
2.0, 2.5, through 14 Periods of the 
Input Frequency 


— Support Popular Display Resolutions: 


VGA, XGA, NTSC, PAL, and SECAM 
— On-Chip Triple DAC for Analog RGB/ 
YUV Output 


— Mix Graphics and Video Images ona 
Pixel by Pixel Basis 

— Real Time Expansion of the Reduced 
Sample Density Video Color — 
Components (U, V) to Full Resolution 

— Three Independently Addressable | 
Color Palettes 

— Programmable 2X Horizontal 
Interpolation of Y Channel 

— 16 x 16 x 2-Bit Cursor Map with | 
Independently Programmable 2X 
Expansion Factors in X and Y 
Dimensions 

— YUV to RGB Color Space Conversion 

— 2X Vertical Replication of Y, U, and V 

- Data for Displaying Full Motion Video 
on VGA Monitor 3 

— Register and Function Compatible 
with the 82750DA be ib 


Intel’s 82750DB is a custom designed VLSI chip used for processing and displaying video graphic information. 
It is register and function compatible with the 82750DA. 


Reset inputs allow the 82750DB to be genlocked to an external sync source. By programming internal control 
registers, this sync can be modified to accommodate a wide variety of scanning frequencies. A large selection 
of bits/pixel, pixels/line, and pixel widths are programmable, allowing a wide latitude in trading-off image 


quality vs update rate and VRAM requirements. 


The 82750DB can operate in a digitizing mode, wherein it generates timing and control signals to the 82750PB 
and VRAM, but does not output display information. Besides digitizer support signals and video synchroniza- 
tion, the 82750DB outputs digital and analog RGB or YUV information and an 8-bit digital word of alpha data. 
This alpha channel data may be used to obtain a fractional mix of 82750DB outputs with another video source. 


Video Output 
f 


ONC 


Video 
Mixer/ 
Display - 


Video Input 


_82750DB 


. Video 
Digitizer 


Ol} = Serial Shift 
Register oo 
i DATAIN[ 31:0 2. 
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82750DB Subsystem Diagram 


Intel Corporation assumes no responsibility for the use of any circuitry other than circuitry embodied in an Intel product. No other circuit patent 


licenses are implied. Information contained herein supersedes previously published specifications on these devices from Intel. 
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1.0 82750DB PIN DESCRIPTION 


t 


Pinout 


130 128 126 124 122 120 118 116 114-112 110 108 108.104 102 100 
129 127 125 123 121 119 117 115. 113 11 109 107 105 103 101 


ome) sighted oben race a 
VSS \GY vCC AVSS VCC eon SS_ VSS — VSS CC/ ./ vss coke 


eT yee 
AvcC RV VSS~ BU VCC . DRV[7} - VSS 
. :4 
7 PIXCLK pee :0) iis ) ALPHA[O) VCC 


DAVES} | 

IN 

; IREF DAVIE] a | ‘pBUT7] re 

- DRV[4:0) 

a vss 
ALPHA[4] 
ALPHA[5) 

vcc 
ALPHA(6] 

_ VSS 

| ALPHA[7] 

7 oe | : -ACTDIS 

82750DB Pinout = | : oe 
TOP VIEW = & eer ery 

vcc 


OOnN OW aOD = 


sons 
OOWOOWO9O0O9O00000 


_ 
oi 


 VBUS[3:0] 


, | -SCLK[1] 
DATAIN[13:0] = = am Se 7 . : VSS 
SCLK[0] 


O0O0000000000 


O 
on 
Oo 
fe) 
O 
fe) 
fe) 
O 
O 
Oo 
O 
O 
fo) 
O 
fo) 
O 
O 
fe) 
O 
Oo 
O 
O 
fe) 
fe) 
O 
fe) 
O 
fe) 
O 


DATAIN[31] TESTACT# — 


7 | | iz 
© VsS_DATAIN(16:14] DATAIN[21:17] _DATAIN(25:22] _DATAIN[20:26) See sa ela 
O vec ee re 


eae 
-+—yss pt -Lyss tH pal a 
VSS VCC ee ye v ve ise peure 


COD00D0DD0NDDNDCNONCNONDNCOOCCOCONON0N 


34 35 36 37 38 39 40 41 42 43 44 45.46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 


240855-2 


Figure 1-1. 82750DB Pinout | 
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Table 1-1. Pin Cross Reference by Pin Name 


Location 
87 
88 


— Pin Name 
DATAIN[15 
DATAIN[14 
DATAIN[13 
DATAIN[12 
DATAIN(11 
DATAIN[10 
DATAIN(9] 
DATAIN{8] 
DATAINI(7] 
DATAIN(6] 
DATAIN[5] 
DATAIN[4] 
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Table 1-2. Pin Cross Reference by Location 
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Figure 1-2. 82750DB Functional Signal Groupings 
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Table 1-3. Pin Descriptions 


Symbol 


FREQIN 


RESETB# 


VRESET# ct 


HRESET # 


VBUS[8:0] 


SCLK[1:0] 


DATAIN[31:0] 


_EXTERNAL RESET: Input signal which places all units in the 82750DB into an 


FREQUENCY INPUT CLOCK: In normal use, the 82750DB supplies refresh 
timing for an associated VRAM through the 82750PB. This places a lower limit 
on the line frequency, which is a programmed multiple of FREQIN. It must 
generate enough refresh cycles, so a minimum line rate of 4 kHz is required. 
Furthermore, the 82750PB may run no less than 1% the speed of the 82750DB, 
since the 82750PB samples the timing and control signals generated by the 
82750DB. The period of FREQIN is known as a “T” cycle. 


initialized state, and sets the transfer rate to a default value of 1/3X the 
operating frequency. It is an edge sensitive iniput which must be held low for a 
minimum of ten T-cycles. The slowest transfer rate is selected to ensure that 
the 82750DB will read the register information correctly during the first register 
transfer, independent of the speed of the VRAMs. During the reset state, the 
analog video outputs and digital outputs are set to the black level. This will 
occur a maximum of four cycles after RESETB # is set to a zero. This signal is" 
also used in conjunction with the TESTACT # input to disable outputs. 


VERTICAL RESET: By programming a bit in an internal register, the 82750DB 
may be placed in the Genlock mode. If this mode is selected, assertion of 
VRESET # resets all vertical timing to the first line of the next field. It does not 
affect the horizontal timing, but does generate the on-chip end of field signals. It 
is an edge sensitive input that is sampled in the 82750DB at the internal time 
corresponding to the rising edge of FREQIN. If the Genlock mode has not been 
enabled, this signal will have no effect on the sync timing. The 82750DB will 
then operate in a free-running mode. Refer to Chapter 3 for a detailed 
description of genlocking the 82750DB. 


HORIZONTAL RESET: When in the Genlock mode, this input will reset all of 

the horizontal timing to the start of the line (beginning of horizontal sync). 

_HRESET# does not affect vertical timing (except for an up-to one-line delay) or 

any other 82750DB registers. This signal is an edge sensitive input that is 

| sampled in the.82750DB at that internal time corresponding to the rising edge 
of FREQIN. As was the case with the VRESET # signal, this input will be 

ignored when not in the Genlock mode. 


VDP COMMUNICATION BUS: The 82750DB outputs status and VRAM franicfor 
requests over these lines to the 82750PB, for 2 to 16 T-cycles (as programmed 
by the user). Transfer requests can tie up the 82750DB/VRAM, 82750PB/ — 
VRAM, or 82750PB/82750DB (VBUS) interfaces for a longer period due to 
VRAM arbitration. When signals are not being sent out, the VBUS has value 
1111, the “null command.” 


VRAM SHIFT CLOCKS: Transfer requests to the 82750PB cause a VRAM 
address to be set up, and the VRAM serial registers loaded (in the case of 
displaying) or unloaded (in the case of digitizing). These signals are used to shift 
data out of and into the VRAMs. Both signals are identical, and run ata 
maximum rate of 1X of the pixel frequency, except during transfer requests, at 
which time they run at 1X, 1/2X, or 1/3X of the operating frequency of the 
82750DB, as programmed by the user. 


DATA INPUT BUS: This is the input data clocked in from VRAM A by the 
SCLK[1:0] signals. The format of the input data is a function of the programmed 


number of bits/pixel and of the type of transfer cycle being executed. Data will 
be samied sla on the rising edge of FREQIN. 
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- Table 1-3. Pin Descriptions (Continued) 


Symbol Name and Function 


FRAME CAPTURE ON: This is the output signal which indicates to the digitizer 
that the VRAM serial port has been turned from read mode to write mode. The 
digitizer may then drive the (common) VRAM serial register data 1/O pins. FCO 
will be asserted after the programmer specifies digitization, five lines after the 
start of the active vertical display, at the time of HSYNC. This gives the external 
logic time to.switch directions of the VRAM serial data bus. This signal will end 
four lines after vertical active stops, at the next HSYNC, to make sure the digitizer 
is off before the next beginning-of-field register transfer. 


~HORIZONTAL SYNCHRONIZATION: Video synchronization signal which is 
asserted at the beginning of every line and ends a programmed time later. (The 
duration of this signal is specified in T-cycles.) 


VERTICAL SYNCHRONIZATION: Video synchronization signal which can be 
programmed to start (once) and end (once) in every field. (The start and stop 
position may be specified in half-line units.) 


COMPOSITE SYNCHRONIZATION PULSE: This contains the eroureiiiigd 
vertical serration and equalization information, as well as horizonal 
synchronization pulses. 


BURST GATE: This signal starts and stops at user-programmable horizontal 
positions in each line, in a programmable vertical group of lines. The primary use 
of this signal is to provide a “‘window’’ during which the BURST output should be 
inserted to generate a baseband NTSC signal. The output frequency is set by an 
integer divisor (O—31) and the rate of the FREQIN clock input. To use this 
effectively, the 82750DB must operate at an integer multiple of the NTSC 3.58 
MHz color subcarrier. The number is programmed | in two's complement form in 
the General Control register. 


PIXEL CLOCK: This output signals valid data on the DGY, DRV, DBU, GY, RV, | 
and BU lines. PIXCLK becomes active one-half of a T-cycle after valid data 
appears on DGY, DRV, or DBU, and coincident with GY, RV, and BU. During 
active display time it is issued at a steady rate of 1/(T-cycles/pixel) times per T- 
cycle, and otherwise at a steady rate of once per T-cycle. Its duration is one-half 
of a T-cycle, and its rising edge may synchronize with either rising or falling edges 
of FREQIN depending on the pixel frequency. This signal may be used to 
synchronize off-chip processing of the pixel data outputs. 


ANALOG PIXEL OUTPUTS: These signals are the processed pixel data from the 
82750DB in analog form. During the display, these signals may be programmed to 
output pixel data in either YUV or RGB format. 


sa om el 
Eermet 


DIGITAL VIDEO OUTPUTS: These are the me outputs of the GY, RV, and BU 
channels, respectively. They are valid with respect to the rising edge of PIXCLK. | 


PIXCLK 


GY, RV, BU é 


DGYI[7:0], 
DRV(7:0], 
DBU[7:0] 


ALPHAI7:0] 


ALPHA CHANNEL: These 8 bits are used to output a digital value for mixing the 
82750DB output with another video signal off-chip. The alpha channel information 
_ may be included in the pixel data, or may be output based on a comparison of the 
pixel data with user-programmed values. 


ACTIVE DISPLAY: This is the active portion of the epas as programmed by the 
user. It is delayed by the pipeline through the 82750DB, which is 5 lines vertically 
and a variable number horizontally, depending on the display mode. 
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BPP[1:0] | 


DISDAC 


TEST # 
~ Vv@Cs 


IREFIN 


| AVoo | | | 


aan ac 


Outputs are set to a high-impedance state. 


TESTACT# 
oe | 
ney 


INTERNAL VOLTAGE REFERENCE: This signal must be decoupled to AVCC. 
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Table 1-3. Pin Descriptions. (Continued) 


Name and Function 


BITS PER PIXEL: During the nonactive display, the user programmed bits/pixel i is 
encoded on these lines. During active display, the BPP[0] signal is multiplexed 
with a signal, Cursor Active, which indicates if the cursor data is currently active 
(non-transparent). When the Cursor Active output signal is asserted, this indicates 
that cursor overlay data is currently being output. Also during the active display, 
the BPP[1] signal is multiplexed with a signal, VUGR, which indicates whether the 
82750DB is operating in a graphics or video mode. When the VUGR output signal 
is asserted, this indicates the G, R, and B outputs are derived from the 
subsampled VU data. These pins allow users to latch the BPP[1:0] signals during 
nonactive display time (as indicated by ACTDIS being zero) for post-processing of 


the 82750DB output. The active cursor window on BPP[0] can be used during 
-active display, to multiplex in other video streams into the output display. The | 


ane table illustrates the encoding on the BPP. cats 


ACTDIS —— 7 


a : i - A 
32 fo Le = 08 
pseudo 16 0 ee | ean ea 
Be : ae - Cursor Active | VUGR 
416 1 Cursor Active | VUGR 

32 4 Cursor Active | VUGR 
pseudo 16 1 Cursor Active VUGR 


DISABLE ANALOG OUTPUTS: When this input is active, ne Analog Pixel 


DISABLE DIGITAL OUTPUTS: When this input is active, the digital outputs of the 
82750DB will be set to zero. In applications that use omy the ep aneeg ae of the 
82750DB, the digital outputs must be disabled. 


TEST ACTIVE: Active low signal that is used in conjunction with the RESETB# ~ 
signal to allow the chip to perform one of the. folowing functions: 


| Enter Reset State 
Enter Reset State © 
| Tristate All Outputs 
| Analog Outputs are Zero 
Normal Operation 
_ Reserved 


TEST INPUT: This ancl must be set to VCC to guarantee correct chip operation. 


ANALOG CURRENT REFERENCE: Under normal operation, this signal should be 


tied to a temperature compensated Curren Rieieiice to AVSS. This signal mest. . 


be decoupled to AVCC. 
ANALOG POWER pin provides +5. Voc supply to the Digital to Analog Convener 


ANALOG GROUND pin provides the OV connection to which the analog outputs 
are referenced. This must be connected to VSS. | 


GROUND pins provide the OV connection re which all inputs and os are 
_Feferenced. a. | 


Level Asynchronous 
[tow [Asynchronous —_ 
) 
7 


All output pins have an active level of HIGH, and are 
floated when RESETB# and TESTACT # are set to 
a zero. The exceptions are GY, RV, and BU which 
will be forced to a zero level. Ss 


2.0 ARCHITECTURE 


Overview 


There are 10 units in the 82750DB. Each of the units 
operates independently at the maximum clock rate 
input to the chip. The contro! information for each 
block is distributed in programmable’ registers 
throughout the chip. These registers are. loaded on 
user-specified lines during the horizontal and vertical 
blanking intervals of the field. The register data that 
was read in from VRAM is passed from. block to 


Asynchronous | 


ASynchronous 
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as the video synchronization signals. Sync and tim- 
ing information may be derived from two sources: 
from the master clock, in which case the control reg- 
isters on the 82750DB are programmed to provide 
the desired display frequency in terms of periods of 
the master clock (T-cycles), or from the horizon-. 
tal and vertical external reset signals. (The latter 
is known as the genlock mode.) Characteristics 
such as line rate, blanking and border intervals, and 
composite synchronization parameters can be in- 
dependently set. Since the 82750DB can be 
reprogrammed once each line, horizontal strips of 
different resolutions can be supported on the same 
display. However, the horizontal strips that can be 


supported are limited by the host processor’s re- 7 ee 


sponse to redefining the bitmap pointers resident on 


. the 82750PB. 


The horizontal and vertical display parameters are 
fully programmable. Figure 2-2 illustrates the hori- 
zontal programming parameters. The line starts at 
the programmed start position, with the length of 
half of a line programmed in T-cycles. The length of 
the total line is twice the half-line length. Parameters 
such as horizontal sync start, horizontal sync width, 
horizontal blanking start and stop, and horizontal ac- 
tive start and stop are all specified by the user. Note 
that the border time is not explicitly programmed, but 


_ is defined as the region of the display line where 


neither active display nor blanking is programmed to 
occur. In order for the 82750DB to function correctly, 


~ the width of the horizontal active display should be 


programmed such that the end of the horizontal ac- 


_ tive display coincides with the end of the last dis- 
_ played pixel. | 


block during the blanking intervals of the display, on . 


the same lines that the pixel information is passed 
during the active display. The Functional Block Dia- 
gram is shown in Figure 2-1. | 


In order to maximize speed and compensate for pro- 
cessing delays, the chip is heavily pipelined. All in- 
ter-block information is delay-equalized to accom- 


modate the different pipeline lengths in each mod-. 


ule. As a result, the total pipeline delay is dependent 


on the number of processing units that are used to” 


generate the display. Chapter 4 describes how the 
user programming is affected by these pipeline de- 
lays. 


Each of the units are described in more detail in the 
following sections of this chapter. | 


Sync Generation and Timing 


The sync generation and timing block generates all 
of the internal timing and control signals, as well 


Figure 2-3 shows the vertical programming parame- 
ters. The basic unit for vertical programming is in 
units of half lines, with the half-line count for each 
field starting at zero. Where appropriate for a param- 
eter, the count is programmed in units of full lines. 
The length of the complete field is programmed in 
half lines, which makes it convenient for distinguish- 
ing between interlaced and non-interlaced displays. 
(For interlaced displays, the number of half lines is 
odd, for non-interlaced displays, it is even.) The ver- 
tical active and blanking regions may be indepen- 
dently programmed, with the border time defined as 
the region where blanking and active display is not 


On. 


NOTE: 


Sync parameters are completely independent of 
the display parameters. This allows the sync sig- 


nals to be positioned anywhere in the field (even 
during active display). 
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- Figure 2-1. 82750DB Unit Level Diagram 
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VBUS Control 


The VBUS controller sends all 82750DB requests 
for display bitmaps, VRAM refresh, and synchroniza- 
tion information to the 82750PB, at programmable 
times during a field. Transfer requests are scheduled 


to occur on a line basis, so only their vertical position 


(or line) is specified by the user. Other commands, 
’ like refresh requests, occur every line, and their hori- 


zontal position (or dot position) in the line must be |. 


specified by the user. Transfer requests are given 
the highest priority by the VBUS control circuit and 
are performed first during a blanking interval. The 


or _ programmer has the responsibility of scheduling the 


. line oriented codes, like refresh, so that they do not 
collide with the transfer requests. _ | 


Besides arbitrating the scheduled transfer requests, 
the VBUS controller also reads the data from the 
VRAM shift registers using the two shift clock out- 
puts (SCLK[1:0]). The code corresponding to the 


ty 


mn mf Aata tar kha enael me me ne 


Ype Of Gaia WO OS FSaa is asserted for a programma- 


ble number of cycles on the 4-bit VBUS. The 
82750DB ‘then waits a programmable delay before 


When the delay wait is over, the shift clock outputs . 
are activated. The SCLK[1:0] signals’ behavior is 
dependent on the transfer rate that the user has se-: 
lected—either 1X, 1/2X, or 1/3X the operating fre-: 
quency. Note that if the RESETB# signal is applied, 
the transfer rate is automatically set to 1/3X during. 


the first automatic register transfer, regardless of the 


state of the transfer rate selection. The transfer rate: 
may be changed in the first register transfer after. 
RESETB # is set to.a logic one value. 


Figure 2-5 illustrates how the SCLKs operate in the 
1X mode in a system. SCLK[1:0] signals will toggle 


between zero and one on the rising edge of 


FREQIN, after an internal logic delay. The data is 


~ read into the 82750DB on the rising edge of the in- 
~ ternal clock, one 82750DB clock cycle after the 


reading the data from the VRAM. This delay should . 


be long enough to guarantee that the 82750PB has 
completed loading the information into .the serial 


SCLK outputs are asserted. Since there are 32 data 
input pins, each SCLK can read in the serial data 
from eight 256 x 4 VRAM memory devices. Adding 
external buffering to ine SCLKs e Ghve: more mem- 
ory) will also aaa aelay to ine Memory access. Ti 

delay increase may require anor than one a toes 
before the VRAM data is valid. In this case, the time 
between the rising edge of the internal 82750DB 


- clock that generates the SCLKs and the edge that 


shift register of the VRAM. Both signals are off while © 


the code causing the transfer cycle is active on the 
VBUS, as well as during the read delay time. Figure’ 
illustrates this communication between ane 
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62750PB and the 82750DB. 


latches the data must be increased. 


There. are two solutions, the operating frequency of 


_ 82750DB can be lowered to accommodate a longer — 
.. T-cycle, or the 1/2X:SCLK mode may be selected 
(as shown in Figure 2-6). When using the 1/2X 


aA transfer rate, the data is read into the 82750DB on | 


the rising edge of the internal clock, two 82750DB 
clock cycles after the SCLK outputs are asserted. — 


Programmers 82750DB Delay 


Progeammabis 82750DB VBUS Code Length 


: < (2° a 18 8275008 Clock fycles) > 
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| . 


The 82750PB must have 
finished decoding the 
VBUS code. 


DATAO DATA 2 


The 82750PB must have executed 

. the 62750DB transter request. 
(DATA should be in the serial 
shift register of the VRAM.) . 


Figure 2-4, 82750PB/82750DB Communication 
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Figure 2-7 illustrates 1/3X (default) shift clock oper- cycles for the VRAMs to output valid data, which 
ation that is used during the RESET mode or may be gives extra margin for applications that need longer 
programmed by the user. The first word of data is shift read cycles (due to slower memories or exter- 
latched by the 82750DB on the rising ede of the nal logic delays) and do not wish to operate the 
FREQIN that is three T-cycles after the SCLK out- 82750DB at a slower speed. 

puts were asserted. This allows three full 82750DB 


r— 1 T-CYCLE ——, 
| ! 


8275008 | rN a A ae Ae | 
FREQIN 
+ —— Tpsctk | 
| ? : | 
SCLKI1:01 , 
i+ Access >, . 
| 


240855-8 


82750D8 | 
- FREQIN | 


t 


-SCLKI1:0] 


- : Taccess ———————> 


| 
' Tsetup 


240855-~9 


82750D8 © 
FREQIN 


~” SCLK1:01 


~ Taccess ———, . 
a _ | 
VRAM 
data 


—— 240855-10 | 


Figure 2-7. 82750DB 1/3X Shift Clock Operation 
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When reading data from memory during active dis- 

play, the SCLK[1:0] outputs operate at a rate re- 

quired to support the programmed display rate. This 
rate is determined from the following equation: 


(# of bits/pixel) 
(32-bit/word) * (# word/fetch) * (#T-cycle/pixel) 


where: # bits/pixel and # T-cycles/pixel are user- 
programmed 


# word/fetch is: 1 


RATE = 


~ The SCLK[1:0] outputs will be the same frequency 
as the input clock in the 1X shift clock mode, and 


* . One half the input clock frequency when using the 


1/2X mode. The frequency will be one third in the 
input clock when using the 1/3X mode. In the 1/3X 
mode the SCLK[1:0] outputs will be high for one 
T-cycle, and low for 2 T-cycles. 


VBUS CODE DESCRIPTION 


82750DB 


The other parameter the programmer needs to. set is 
the SCLK delay. This can be found in the Pixel Con- 
trol Register. It is the number of 82750DB clock cy- 
cles that the DB will wait before clocking in data, out 
of the VRAM, after the initiation of a transfer request 
on the VBUS outputs. ° 


REGX (0010) This command requests that the 
82750PB transfer 82750DB register information into 


the VRAM shift registers. Besides the automatic 


When the 82750DB is actively fetching and display- 


ing pixels, VUXFER, BMX/YBMNPX, and REGX are 
typically sent over the VBUS. Of the three codes, 
REGX has top priority, followed by VUXFER, and 
last by BMX/YBMNPX. These commands may be 


programmed to occur each active line during the - 


blanking interval for the line just completed. If.a reg- 


-ister transfer has been programmed for an active 


line, it takes priority and is executed first. Otherwise, 
immediately after the register transfer, any sched- 
uled VUXFER and BMX/YBMNPX commands are 
executed. The programmer has the responsibility for 
verifying that the sum of times required by these 
commands does not exceed horizontal non-active 


display time. The 82750DB will commence fetching . 


pixels at the subsequent start of active display. A 
detailed explanation of the different types of VBUS 
commands and their corresponding codes follows. 


Transfer Requests 


The following-commands request the 82750PB to 
transfer information from the VRAM array into the 
VRAM shift register. When multiple requests are pro- 
grammed for a given line, they are listed in the priori- 


ty they are sent. When asserting a transfer request, 


the programmer must be aware of two other pro- 
grammed parameters, VBLEN and SCLK delay. 


The VBLEN parameter is a user programmed value 
whose bits lie in the General Control Register. It is 
the length of time, in 82750DB T-cycles, that a par- 
ticular VBUS code will be held at the outputs. It is 
_used to ensure that the asynchronously operating 

82750PB chip will have enough time to recognize 
and begin operating on an 82750DB transfer re- 
quest. | 


82750DB register transfer that occurs on the second 


line (line 2) of each field, the programmer can speci- 
fy the next horizontal line on which another register 
transfer is to take place. The transfers may be 
scheduled many times during the field. On the first 
transfer, the 82750PB uses the contents of its . 
82750DBc register as the starting address of the 
82750DB register data. On each subsequent ac- 
cess, the programmed pitch value in 82750PB’s 
82750DBc-PITCH register is added to the accumu- 
lated start address. The programmer must ensure 
that the data is stored in VRAM at the correct ad- 
dress. Since the pitch remains constant, the longest 
register load will determine the pitch value. 


The VBUS unit performs a vertical checksum on all 


the register information. Each bit in the register word 


- undergoes an exclusive-OR with the corresponding 


bit in the previous data word. The 82750DB com- 
pares this information with the user generated 
checksum, which is the last 32-bit data word read 
into the 82750DB during a register transfer. If the 
values do not match, the 82750DB will disable all of 
its digital sync and data outputs, enter the reset 
state, and send a SHUTDOWN code (82750DBSD) 
to the 82750PB over the VBUS[3:0] outputs. If the 
new checksum is. correct, the new register values 
will take effect immediately. 


VUXFER (0001) This code is used to request VU 


- data, providing new VU data is required by the 
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82750DB. This command is issued only on vertically 
active lines (as programmed in the register, not as 


‘seen on the screen) and possibly the four lines after. 


On each line, a row of V and/or U samples are load- 
ed into the VU interpolator line stores. The pattern of 
requests depends upon the mode in which the VU 
interpolator is operating. In the interlaced VU mode, 
one line of samples for both the V and U compo- 
nents are fetched during each transfer; in the non-in- 
terlaced VU mode, only one line of samples for ei- 
ther the V or U components is fetched. Table 2-1 
illustrates the pattern of requests. M is the pro- 
grammed first vertical active line, and N the last ac- 
tive line. The modes listed have VU transfer re- 
quests following the end of horizontal active of the 
lines specified, stopping with the last line, N + 4. 


intel. 


Table 2-1. VU Transfer Request Patterns 


: Line. | 3 | 


2x Non-Interlaced | M Fetch 1st Line of V 
Fetch 1st Line of U | 
Fetch 2nd Line of V 
Fetch 2nd Line of U 
Fetch Last Line of V 


Fetch 1st Line of V and U 
Fetch 2nd Line of V and U 
Fetch 3rd Line of V and U 
Fetch Last Line of V and U 


Fetch 1st Line of V 
Fetch 1st Line of U 
Fetch 2nd Line of V 
Fetch 2nd Line of U 
Fetch 3rd Line of V 
Fetch Last Line of V 


Fetch 1st Line of V and U 
Fetch 2nd Line of V and U 
Fetch 3rd Line of V and U 
Fetch Last Line of V and U 


2x Interlaced 
(Odd and Even 
Fields) 


4x Non-Interlaced 


4x Interlaced 
(Odd and Even 
Fields) 


The 82750PB uses another internal pointer to cause 
the VRAM to load the desired VU data into its shift 
registers (incrementing the pointer by a pitch value). 
This command is asserted for a programmable num- 
ber of T-cycles (m), as specified in the Miscellane- 
ous Control register. Then, the 82750DB fetches 
them, tying up the 82750DB/VRAM interface for 
‘(n + 2) cycles, where nis 1/4 the programmable total 
number of 8-bit samples of V and U fetched. Note 
that one extra word, which may overlap the next 
VBUS command, is fetched. 


By setting a bit in the Miscellaneous Control register, 
. it is possible to replicate lines of V and U generated 
by the interpolator for the entire field. Since each 
line of VU data is displayed twice, the rate that the 
VU sample map has to be fetched from VRAM is 
reduced by 1%. Table 2-2 lists the sequence of VU 
loads. 


In some cases, the VU interpolator may cover only a 


portion of the display. In those instances, M in the © 


above examples would be the first line that VU inter- 
polation is enabled. N would be the last line that VU 
interpolation is enabled. Regardless of the state of 
the Line Replicate bit, there would be no vertical 
pipeline delay between the loading of the first line of 
samples and the second line of samples. The first 
line of samples would be loaded at M-1, and the 
second line at M. This reduces the delay between 
switching interpolation modes during a single dis- 
play. 
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Table 2-2. VU Transfer Request Patterns. | 
with Line Replicate 


Active 
Line 


2x Non-Interlaced | M 


Request 


Fetch 1st Line of V 
Fetch 1st Line of U 
Fetch 2nd Line of V _ 
Fetch 2nd Line of U 
Fetch 3rd Line of V -- » 
Fetch 3rd Line of U 
Fetch Last Line of V 


Fetch 1st Line of Vand U > 

Fetch 2nd Line of V and U 
_| Fetch 3rd Line of V and U 

Fetch Last Line of V and U 


Fetch ist Line of V 
Fetch ist Line of U 
Fetch 2ndLineofV 
Fetch 2nd Line of U © 
Fetch 3rd Line of V_. 
Fetch 3rd LineofU. > 
Fetch Last Line of V © 


Fetch 1st Line of V and U | 
M + 4 |Fetch 2nd Line of V and U- 
M+ 8 |Fetch 3rd Line of V and U 
N+ 4 Fetch Last Line of Vand U 


2x Interlaced 
(Odd and Even 
Fields) -. 


4x Non-Interlaced 


4x Interlaced 
(Odd and Even 
Fields) 


BMX (0000) This command requests a bitmap. 
BMX (0000) is sent after horizontal active stops, be- 
ginning .on the fifth line after vertical active starts, 
and continuing until the fifth line after vertical active 
stops. (There is a vertical pipeline delay of five lines 
through the 82750DB, due to internal timing require- 
ments.) A line programmed to start at line M, wil 
have its first active line displayed at line M + 5. The 
82750PB uses an internal pointer to cause the 
VRAM shift registers to be loaded with pixel values. 
The 82750DB subsequently fetches them as re- 
quired for display. This command is asserted on the 
VBUS for the user-programmed number of T-cycles 
and must be completed before active display begins. 


YBMNPX (0100) This command performs a.-Y bit- 
map transfer without performing a pitch calculation. 
When the line replicate mode is selected by Bit 22 in 
the Miscellaneous Control register, this code is as- 


serted every other display line so that the same. line 


of information can be used twice. . 


inte. 


Digitizer Commands Oe, 


When ‘in the line replicate mode, and digitizing an 
NTSC source (for example, when genlocking an 


NTSC source to a system that uses only a VGA 


monitor), each line of captured data is effectively 
output at twice the rate. Since each line need only 


be stored once in memory (it is duplicted automati-. 


cally in ‘the display mode) only one WRDIGI code, 
followed by a.-WRDIGINP, is sent every other line. 
On alternate lines, two WRDIGINP are sent and will 
select the last address that was written, without in- 


so a . crementing the 82750PB bitmap. address pointer. 
ce This is described in detail in Chapter 3. 


WRDIGI (0011) This command requests a write of 
digitized data. The operation of this command is de- 
pendent upon the external hardware and is dis- 
cussed in the section on genlocking (page 29). If 
digitizing is enabled, this command is asserted on 
the VBUS for a programmable number of T-cycles. 
The pointer is then incremented by a pitch value. 


Since each horizontal line is stored in a single row of — 


memory, this pitch value is equal to the horizontal 


resolution, in bytes, for non-interlaced bitmaps. For — 


interlaced bitmaps, the pitch value is equal to twice 
the horizontal resolution, in bytes. This allows alter- 
nate lines of data to be earns over in successive 
fields: | 


WRDIGINP (0111) This command allows access 
to digitized data without performing a pitch calcula- 
tion. WRDIGINP (0111) requests that the 82750PB 
perform a transfer request at the last calculated ad- 
dress. Note that only a memory transier cycie is per- 
formed—the pitch value is not added to this ad- 
dress. This will always’ ensure that the digitized data 
is. written into the last selected memory address, in 
case. a. physical memory boundary has been 
crossed. This command is asserted after the WRDI- 
Gl transfer has ome 


Refresh ag Control Commands - 


The following signals are used to pass refresh re- 
quests and control information to the 82750PB. 


DFL (1000) The Display Format Load command is 
a maskable host processor interrupt that can be pro- 
grammed to occur at any.time during the display. 
This is used by the 82750PB to transfer the shadow 
register contents into the working register set in the 
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82750DBSD .(1001) This command is the 
82750DB Shut Down code. During every register 
transfer, the 82750DB keeps an internal vertical ex- 
clusive-or checksum of the register data as it is read 
onto the chip. The last word of data that is read 
during the register transfer is the user-generated 
checksum. If the two checksums match, operation 
proceeds as normal. If they do not match, the 
82750DB enters the reset state and sends this code 
to the 82750PB. The 82750DB will remain reset until 
the reset pin is asserted and negater by the host | 
processor. 


REFRESH (1010) This command asks. the 
82750PB to generate up to 15 refresh cycles every 
horizontal line. The 82750DB transfer cycles have a 
higher priority than refresh requests in the 82750PB. 
REFRESH will not be asserted if programmed to oc- 
cur at the same time as a transfer request code. 


The following codes are used to pass the video line 
and field: information com 82750DB to the 2 Pee 
processor. 


VEVEN (1101) This code indicates the start of an 
even (i.e. second) field of a frame. This command is 
sent coincident with line one of each even field. 
When genlocking to an external source (See pg. 29), 
the occurence of a vreset signal during programmed 
horizontal active time will cause the 82750DB to out- 

put a VEVEN code on the VBUS. | | 


VODD (1100) This soda: indibaies the start of an 
odd (i.e. first or only) field of a frame. This command 
is always sent immediately after RESETB# is neg- 
ated, and coincident with line one of the odd field. 
Similarly, when genlocking, the occurence of a 
vreset signal during any time other than horizontal 
active time will cause the 82750DB to output a 
VODD code on the VBUS. - 9 


HLIN (1110) This code marks every horizontal line 


VRAM interface. This is useful in supporting split- 


screen-type applications, where it is desirable to 
change the bitmap pointers at some Pome before the 
end of the display. 


at a programmable point in the line. HLIN is used by 
the 82750PB to increment its horizontal line counter. 
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Pixel Processing Path 


This logic accepts the 32-bit word from the input 
latch and divides the word into the programmed pix- 
el format. This will: result in either four 8-bit pixels, 
two 16-bit pixels, one 32-bit pixel, or an 8-bit pixel 
with an 8-bit alpha value (pseudo 16-bit mode). The 
pixels act as addresses to the color table, or may 
bypass the table completely as described below. 


Pixel information may be mixed with the output of 
the VU interpolator, which outputs interpolated sam- 
ples derived from a reduced sample bitmap. The 
least significant bit of Y or LSB of U can be pro- 
grammed to act as a switch between using the ex- 
plicit pixel value of YUV or using the luminance por- 
tion of the pixel with the VU portion obtained from 
the interpolator. If the value of the LSB of Y (or U, 
whichever is selected) is zero, the pixel data is used. 
lf the LSB of Y (or U) is one, the output of the VU 
interpolator is used. Note that if the LSB of Y is used 
as the switch flag, the luminance pomene of the word 
will be only 7 bits wide. 


The alpha information is also processed in this 
block. The alpha data may come from one of two 
‘sources: it may be explicitly coded in the pixel word, 
as is the case in the 32-bit/pixel and pseudo 16-bit/ 
pixel mode, or it may be obtained by comparing the 
Y portion of the pixel with a preprogrammed value 
and outputting one preprogrammed value if they 
match and a different value if they do not match. 
This latter capability is known as Alpha Trap. 


VU Interpolation 


When VU interpolation is enabled by the program- 
mer, and when the display is in the active region, 
“VU data” will be fetched, as required by the inter- 
polator (by the mechanisms discussed previously in 
the section titled ““WBUS Code Description’’). This 
data has the format V, V,...,V, U, U,..., U where 
each V or U is 8 bits, and the bytes are grouped into 
32-bit double-words with. earliest in lowest order. 
The number, ‘“N”, of V bytes and U bytes is the 
same; N is programmed to be either 256 samples, or 
one of 32 to 192 samples in 32-byte increments. 


The first V data and the first U data fetched on the 
first line of VU interpolation supplies the VU value for 
the first active pixel on that line. All the other VU 
pairs that are fetched define values for the grid of 
pixels defined below and to the right of this one by 
the VU expansion factor every other or every fourth 
horizontally and vertically. Most other VU values are 
filled in recursively by interpolation. Wherever there 
is a pixel which lies between two pixels with known 
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values, it is given the value of the weighted average 
of the known values. Values are understood to be 
non-negative integers. When the final value is out- 
putted, any fractions are truncated or rounded to the 
closest odd integer according to the programmed 
value of the interpolation round flag. This process is 
iterated until all pixels have assigned color values. If 


. the number of VU data samples loaded into the 


82750DB is not enough to cover the active display 
area, then the last data sample will be replicated 
horizontally across the active display window. 


As mentioned previously in the VBUS Control dis- 
cussion, each line of VU data can be used twice by 
setting the Line Replicate bit in the Miscellaneous 
Control register. Also, each horizontal VU sample 
can be replicated by setting the VU Replicate bit in 
the Pixel Control register. This will cause the V and 
U pixels generated by the VU interpolator every pixel 
time to be used twice. This can result in an effective 
8X horizontal expansion, which is useful when hori- 
zontal blanking time is at a premium. This bit affects 
the horizontal interpolation algorithm only, and will 
not affect the line loading sequence for VU during 
the active display. 


When interpolation is turned on by the programmer 
(by specifying a non-zero number of samples to be 


' fetched), VU interpolation may nevertheless be dis- 


abled for each pixel if the following conditions are 
met: 


1. Conditional interpolation has been selected by 
the programmer, 


AND | 
Either of the two user-programmed conditions: 


a. Switching on the LSB of the U bit has been 
selected, and the lowest-order bit of the U val- 
ue fetched for the upper left pixel in the block 
has value zero. This allows switching to occur 
on a 2 x 2-pixel or 4 x 4-pixel grid, depending 
on the expansion mode the user has selected. 
The full 8 bits of Y and V are used, but the 
usable space of U has been decreased to 7 
bits. 


. Switching on the LSB of the Y bit has best 
selected, and the low order bit of the Y value 
for the current pixel has a value of zero. 


. Display of fetched and interpolated VU values 
may also be suppressed by setting the Interpola- 
tion Output Enable bit (in the miscellaneous con- 
trol register) to zero. This will allow VU data to be 
loaded into the VU line stores without displaying 
VU data. This is useful when a mid-screen tran- 
sition is made between two interpolation modes, 
to compensate for the vertical latency of the in- 
terpolation process. 


eaicnman Lookup Table (CLUT) © 
Operation | 


The 82750DB contains three 256 x 8-bit color look- 
up tables. The color maps can be accessed sepa- 
rately, or may act as-one large 256 x 24-bit table. 
The manner in which the tables are addressed is 


determined by the programmed bits/pixel and de- . 


pends on whether the pixel'is a graphics or video 
pixel. Also each Y, U, and V color table address can 
be masked. The masks can be used in all the bit/ 
pixel modes, but are most useful with the 16-bit/pix- 
el mode. In this mode, the mask allows the YUV 

values to be mapped to 8-bit values instead of 6-5-5. 


Each channel (Y,. U, V) has a MASK SET register 


and a MASK DATA register that selects the color 
lookup address bit to be changed and the new value 
of the bit, respectively. A simple mask operation on 
: one channel is ilustated in gue 2-8. : 


The CLUT Saaress mask operation is determined by 
a logical equation owen by: 


Result = (mask set and aiask data) | Gaask set and data Be) 


Each bit of the Result byte is determined individually 
by this equation. The Result byte is then further pro- 


cessed in order to produce the CLUT RAM address.. - 


cf BM SOS. 5. Me 8. 2 


re Lo Ls [a fs Lo Lo [a] MAskser region 041) 
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For modes that require both. video and graphics to 
pass through the color table, the table can be split 
into two halves: one half for graphics and the other 
for video pixels. By-using the-SPLITCLUT bit in the 
Miscellaneous Control register in conjunction. with 
the LSB of Y or U, the color table address is forced 
to-either the video table or graphics table automati- 
cally. In this case, the masking. operation is:still used, 
but the address is forced to either.an even or odd 
entry, regardless of the results of the masking oper- 


ation. The flag bit that decides between the .two 


types of. pixels automatically. selects the correct por- 
tion of the CLUT, table for a single channel. Note the 


LSB of Y-or U selects the proper half of the CLUT for 


that single component. The SPLIT CLUT mode as- 
sures the proper half .of the CLUT is used for all 


three eompenents: 


. The color table can edits er en 


displaying either graphics or video, independent of 
the programmed bits/pixel. This.is programmed by 
the user via the VIDEO PASS and GRAPHICS PASS 
bits in the Miscellaneous Control register. Table 2-3 
summarizes. the various modes when using the 
CLUT. ; 


PRESEN CREE eres MASK DATA Register (0 x 42) 


: V y | | 
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_ Figure 2- 8. Mask Gneration on CLUT Address | 
Table 2-3. CLUT Modes 
e% Colormap Address | 


Masked Video Data 


Even Address Only (Graphics) 
Odd Address Only (Video) 


oe 

Paes 

ee NS 
ae 
a ae 

ae ae! 
|X| SLUT NotUsedat all 


INel. 


When writing to the CLUT, the most significant byte 
of the data word corresponds to the address, and 
the least significant 24 bits are the YUV data (least 
significant to most significant, respectively). An in- 
dex register is used to allow the 6-bit address to be 
mapped to an 8-bit number. (Refer to Chapter 4 for 
more information.) By resetting the 82750DA Dis- 
able bit, it is possible to make the CLUT look like the 
reduced entry color lookup table on the 82750DA. 


The following paragraphs summarizes the possible 
bit/pixel modes, using the LSB of Y or U switching 
ability and the various graphics and video bypass 
modes. Note that there are modes where the LSB of 
Y or U are not used to switch between graphics and 
video. 


8-BIT/PIXEL GRAPHICS MODE 


This is the graphics-only mode, in which the 8 bits 
are used as inputs to all three color tables. This 
makes the color maps look like a single, 256 x 24-bit 
CLUT and allows 256 unique colors from a palette of 
16 million to be available at any given time. If the 
Graphics Pass bit is asserted, the CLUT will be by- 
passed and the 8-bit values of the Y, U, and V chan- 
nels will be input to each channel of the converter 
matrix. 


8-BIT/PIXEL VIDEO MODE 


When used with subsampled VU information from 
the interpolator, the 8 bits are actually a luminance 


value. The Y portion addresses the Y color table, V_ 


the V color table, and U the U color table. By using 
the color table, a one-to-one mapping exists, allow- 
ing non-linear transformations to be applied to the 
pixel data to enhance the quality of the reconstruct- 
ed image. By asserting the VIDEOPASS bit in the 
Miscellaneous Control register, the color table can 
be bypassed. : es 


8-BIT/PIXEL MIXED MODE 


In the 8-bit/pixel mixed mode the LSB of Y or Uis | 


-used as a switch flag to change the index to the 
color tables. When the switch flag is set to a one, 
the Y value corresponds to a luminance value, and 
the VU values are the chrominance information ob- 
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tained from the VU interpolator. In this case each 
video component is used as an address to its corre- 
sponding CLUT as described above. When the 
switch flag is set to a zero, the VU values are not 
used and the Y value is used as the address to all 
color tables. These pixels are treated the same as in 
the 8-bit/pixel graphics mode. | 


In this mode the applications programmer must en- 
sure that the proper information has been loaded 
into specific areas of the color maps. For example, 
all the video pixels will use the odd address values. 
By restricting. the address used in the graphics and 
video mode, two unique maps may coexist in the 
tables. One map is used for non-linear transforma- 
tions on video data, and the other for graphics color 
lookup table applications. 


As illustrated above, the CLUT can be bypassed by 
asserting either or both of the bypass controls. 


PSEUDO 16-BIT/PIXEL GRAPHICS MODE 


In the pseudo 16-bit/pixel graphics mode each 
32-bit data word is made up of two, 16-bit pixel 
words. The 82750DB processes each 16-bit pixel 
word, so that the least significant 8 bits correspond 
to pixel information, and the most significant 8 bits 
are used as alpha information. The 82750DB uses 
the lower 8 bits as inputs to all three color tables. 
This makes the color maps look like a single, 256 x 
24-bit color table. If the Graphics Pass bit is assert- 
ed, the CLUT will be bypassed and the 8-bit values 
of the Y, U, and V channels will be input to each 
channel of the converter matrix. 


PSEUDO 16-BIT/PIXEL VIDEO MODE 


When used with subsampled VU information, the 
least significant 8 bits of the pixel word are actually a 
luminance value. The most significant 8 bits are 
used as alpha information. The VU information is 
generated by the 82750DB interpolator. Each of the 
color maps uses the corresponding 8-bit video com- 
ponent as an addess. By asserting the Video Pass 
bit in the Miscellaneous Control register, the color 
table can be bypassed. 


PSEUDO 16-BIT/PIXEL MIXED MODE 


In this mode the LSB of Y or U is used as switch flag 
to change the index to the color tables. When. the 
LSB of Y or U is set to a one, the lower 8-bit value 
corresponds to a luminance value, and the V and U 
values are the chrominance information. In this 
case, each video component of the 82750DB is 
used. as a colormap address as described above. 
When the LSB of Y or U is set to zero, the V and U 
values from the interpolator are not used, and the Y 
_ value is used as the address to all color tables. 


16-BIT/PIXEL GRAPHICS MODE _ 


The 16-bit pixel word is broken up on the 82750DB 
to yield 6 bits of Y, and 5 bits each of V and U. The Y 


bits are the least significant, and the U bits are the . 


most significant. These values are then padded with 
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ters. When the switch flag indicates the video mode, 
the lower 8 bits of the 16-bit pixel word and the VU 
values obtained from the interpolar are input to their 
respective CLUTs. If the SPLITCLUT mode is select- | 
ed, the LSB of the address is forced to either an odd 

or even entry in the three color tables, depending on 
whether the data is video or graphics information. 


32-BIT/PIXEL GRAPHICS MODE . 


Eight bits.each of Y, U, and V are used as addresses 
to each segment of the color table. Since the size of 
the addressable color space is not increased, the 
advantage of using the color map is for special ef- 
fects or gamma correction. The most significant 8 
bits of the 32-bit data word are used for the alpha 
channel data. If the Graphics Pass bit is asserted, 
the CLUT will be bypassed and the 8-bit values of 


~ the Y, V, and U will be input to each channel of the 


‘zeros in the lower order bits, to obtain an 8-bit word © 


for each pixel component. Each component ad- 


dresses its respective CLUT. However, the Y chan- . 
nel may access only 64 unique locations, and 5-bit | 


resolution for VU restricts them to 32 unique loca- 
tions each. The address range may be extended by 
using the colormap mask registers to add 2 bits of 
precision in the least significant bits for Y and 3 least 


significant bits each for VU channels. This allows the 


programmer to access all the entries in the color 
table by reprogramming the MASK DATA and MASK 
SET registers: eurng the blanking interval. | 


16-BIT/PIXEL VIDEO MODE 


This mode works like the 8-bit/pixel video mode de- 
scribed above, except that the 82750DB has pro- 
cessed the information so that the Y channel con- 
tains the least significant 8 bits of the 16-bit data 
word. The V and U information is generated by the 
VU interpolator. If the SPLITCLUT mode is selected, 
the LSB of the address is forced to an odd eye in 
the. three color tables. 


46-BIT/PIXEL MIXED MODE 


When the switch flag is zero, the graphics mode is 
selected and the inputs to the CLUT are the respec- 
tive YUV data in the 6-5-5 format. These pixel values 
are extended by using the colormap masking regis- 


converter matrix. 


32- BIT/PIXEL VIDEO MODE 


The Y channel contains the least arificant 8 bits of 
the 32-bit data word. The U and V information is 
generated by the VU interpolator. The YUV channels 


_are input to their respective color tables. The size of 


the addressable color space is not increased, but 
this can be used to take advantage of a non-linear 
transformation, which may aid in the decompression 
process. The most significant 8 bits of the data word 
are used for the alpha channel data. 


32-BIT/PIXEL MIXED MODE 


When the switch-flag is zero, the graphics mode is 
selected, and the inputs to the CLUT are the respec- 
tive 8 bits each of YUV data. These pixel values may 
be masked by using the colormap mask data and 
mask set registers. When the switch flag indicates 
the video mode, the lower 8 bits of the pixel word 
and the VU values obtained from the interpolator are 
input to their respective CLUTs. If the SPLITCLUT 
mode is selected, the LSB of the address is set to 
either an odd or even entry in the three color tables, 
depending on whether the data is video or graphics 
information. The most significant 8 bits of the data 
word are used for the alpha channel data. 
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Y interpolator 


- The Y Interpolator performs a 2X horizontal linear 
interpolation on each line of Y values. When Y inter- 
polation is enabled, the internal pixel clock is twice 
the frequency of PIXCLK output. 


NOTE: 


If Y interpolation is enabled, then only the integer 


_ values of pixel times greater than. 1X may be 
used. | 


The interpolation may be separately controlled for 
both video and graphics pixels, via the Viden and 
Gren bits (bits 12 and 11) of the General Control 
register. A video pixel is defined as one generated 
using VU interpolated values. A graphics pixel does 
not use the VU interpolator. The effects of setting 
the control bits, the 82750DB enable flag, and vid- 
eo/graphics pixel switch (V/G Switch) on the output 
of the interpolator are summarized in Table 2-4. 


Because of the asymmetric nature of the internal 
pixel clock used on 82750DB, the number of T-cy- 
cles between successive Y pixels varies depending 
on the programmed pixel width. When enabled, 
there is a pipeline delay through the Y Interpolator 
equal to the number of T-cycles between each inter- 
nal pixel clock. ; 


‘When the interpolator is bypassed as described 
above, there. is a fixed delay through this block. The 
V and U data are delayed by one pixel clock to allow 
the chroma data to line up with the luminance data. 
Other control signals, such as the register address 
byte (most significant byte of the 32-bit data word 
read frorn VRAM), the pixel clock, horizontal and 
vertical active displays, composite blanking, and reg- 
ister load enable signals are also delayed by one 
pixel clock in order to line up with the YUV data. The 
programmer must ensure that the active display tim- 
ing is programmed to take the appropriate delay 
through the Y Interpolator into account. 
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Table 2-4. Control Bit Settings and 
Resulting Interpolator Output 


82750DB 
Interpolator 
Bypassed 


Interpolator 
Bypassed 
Interpolate 
Graphics Pixel 


interpolate 
Video Pixel 


interpolate 
Video Pixel 


Interpolate 
Graphics Pixel 


¢ 9) 
x/2S 
oOo @ 
> 


Interpolate . 
Both Video 
and Graphics | 
Pixels. 


Cursor 


Hardware support for a 16 x 16-pixel cursor has 
been included on the 82750DB. The cursor is capa- | 
ble of providing sharp color transitions, when using 


subsampled VU bitmaps. Software intervention is 


minimized, leaving the host with more processing cy- 
cles to perform other operations. 


Under normal operation, the XY starting display po- 
sition of the cursor is loaded into the Cursor Control 
register during a 82750DB register load. On the dis- 
play line corresponding to the Y start position, the 


cad, Bf 
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cursor is displayed when the X starting position 
(specified in T-cycles) is reached. On the following 
15 lines, the cursor will be displayed at this X posi- 
tion every line, for both interlaced and non-inter- 
laced displays. 


A normal 82750DB register transfer is used to load 
the entire 16 x 16 x 2 bits (16 words of 32 bits each) 


of cursor data. During this register transfer, the cur- — 


sor data is distinguished from normal register data 
by placing the Cursor Control register immediately 
before the 16 words of cursor data. When the 
_ 82750DB loads the Cursor Control register, it will in- 
terpret the next sixteen 32-bit words of register data 
as the cursor bitmap, and will disable the other regis- 
ters on the 82750DB from decoding the address 
field of the 32-bit data word. (The checksum of the 
82750DB register data is not performed during the 
loading of the cursor bitmap data.) The cursor bit- 


map will be loaded a line at a time, starting at line — 


zero and continuing in sequential order to line 15. 
Each line in the cursor map actually contains sixteen 
_2-bit cursor pixels, with the two least significant bits 
corresponding to the first cursor pixel in that line, 
_ and the two most significant bits corresponding to 
- the 16th cursor pixel on that line. Each 2-bit pixel 
may select one of the three Cursor Color registers or 


transparency, according to the format indicated in| 


Table 2-5. 


Table 2-5. Cursor Color Registers _ 


Cursor Pixel | Output 


Heanispalency 


Din LY an Not 
(Cursor Pixel NOt 


lay 
Cursor Color Register 1 


Cursor Color Register 2 
Cursor ColorRegister3 


Three 24-bit color registers that hold the color infor- 
mation for the cursor may be written to at any time 
during the register load. The cursor may be loaded 
any time during the blanking intervals of the display. 
For displays that do not program the cursor during 
the display, the cursor bitmap may be loaded during 
the vertical blanking interval. 


When the T-cycle count equals the value pro- 
~ grammed into the X start position of the Cursor Con- 


trol register, the first cursor pixel can be displayed. 


on |aced | 10x64 
[On | On interlaced | 92x64 | 


Tor [on [Nomintraced| 16x92 
[on [on] 


82750DB 


Each 2-bit cursor pixel will select one of the three © 


Cursor Color registers or transparency. The 24-bit 


output of one of the three color registers (or the ac- 
tual display pixel data if Uenisperency: is used) is in- 
put to the YUV converter. 


The cursor bitmap length is 16 lines, and the width is 
16 pixels. Although the length of the cursor may be 
changed dynamically by chaining register loads to 


_update the cursor map, the size of the cursor is de- 


pendent on the type of display. For interlaced dis- 
plays, each line of cursor data will appear on the 
same line of each field. This results in a cursor of 
16 x 32 pixels. For non-interlaced displays, the same 
line of cursor information will appear on the same 
line every field. The cursor in this case will be 16 x 
16 pixels. The size of the cursor may be doubled 
independently in the horizontal and/or vertical direc- — 
tion by setting the 2X Horizontal Cursor or 2X Verti- 
cal Cursor bit in the General Control register. In this 
case, no new data is loaded into the cursor map; the 
data is just replicated in the corresponding dimen- 
sion. Table 2-6 summarizes some of the possible 
cursor sizes. Note that by loading the cursor bitmap 
with different data at the start of every field, cursor 
sizes not listed below may be achieved. 


_ Table 2-6. Cursor Sizes 


2X Horz. | 2X Vert. “Displa Cursor Size 
eo a play ee oe 


| Off [interlaced | | 16x32 | x 32 
a Interlaced 32 x 32 
he sOfe Interlaced 16 x 64 


| On |. Off Neteinteraced 16x16. 
Non-interlaced| 42x16 


Non- interlaced 32 x 32 | 


There is a complex salaionehio between the cursor 
and the pixel data especially when using non-inte- 
gral divisors of the pixel clocks. Since the pixel data 
output from the 82750DB pixel path always changes 


-_ coincident with the rising edge of the clock, the cur- 


sor start position must be positioned on the rising 


‘edge of any period of the pixel clock. The program- 


mer ‘must enforce the corresponding restrictions on 
the start and stop position of the cursor. 
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YUV to RGB Converter 


The following equations give the theoretical relation- 
ship between analog RGB components, R, G, B, and 
analog YUV components, Y, U, V. 


Y= 0.298822 R + 0.586816G + 0.114363 B (1a) 
V =R—Y = 0.701178 R — 0.586816 G — 0.114363 B (1b) 
U=B—Y = —0,298822 R — 0.586816 G + 0.8856378 (1c) 
where: 0.0 < G,R,B < 1.0 
0.0 < Y < 1.0 
—0.701 < V < +0.701 
—0.886 < U < —0.886 


Solving for G, R, B, we can obtain the inverse rela- 
tionship: | 


G = Y — 0.509228 V — 0.194888 U (2a) 


R=YtVv (2b) 
B=Y+U | (2c) 
where: 0.0 < G,R,B < 1.0 
0.0 < Y < 1.0 


—0.701 < V < +0.701 
—0.886 < U < +0.886 


The luminance channel for the YUV inputs is pre- 
sumed to swing between 0.0V and 1.0V. However, 
the chroma components do not and need to be nor- 
malized to a OV to 1V range. The offset binary en- 
coding used to obtain unsigned numbers must also 
be accounted for. This encoding should center the V 
and U inputs at the midpoint of the voltage range. 
The equations for the normalized version of Y, V, 
and U (Y’, V’, and U’ respectively) are: 


Y=Y¥ i | (3a) 
, _ 0.5V : 
V’ = 555, + 08 | _ | (3b) 
1, 0.5U | 
U' = agg t 05 (3c) 
where: 0.0 < Y’, V’ U’ < 1.0 
0.0 <'Y < 1.0 


—-0.701 < V < +0.701 
~—0.886 < U < +0.886 
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When converting the normalized analog values Y’, 
V’, U’ to digital y, v, u values, the D.C. offset and 
conversion ranges are compatible with the CCIR 
601 standard for digital video. The ranges for the 
components and the corresponding Digital to Ana- 
log equivalent equations ‘are given below: 


y = (235 — 16) Y’ + 16 (4a) 
where: 16 < y < 2935 


v = (240 — 16) V’ + 16 (4b) 
where: 16 < v < 240 


u = (240 — 16) U’ + 16 . (4c) 
where: 16 < u < 240 


Substituting the normalized analog voltages of 
Equation 3 into Equation 4, we obtain the digital ver- 
sion of the input data, used in the DVITM Technology 
system: 


y = (219) Y + 16 (5a) 
ye ee aig | (5b) 
0.701 
112U | 
aang ie (5c) 


where: 0.0 < Y < 1.0 

—0.886 < U < 0.886 

—0.701 < V < 0.701 

16 <y < 235 

16 <v,u < 240 
By solving equations 5 for Y, U, V, and substituting 
into Equation 2, we get the relationship between an- 
alog R, G, B and the digital DVI y, u, v data: 


G = 0.004566 y — 0.003187 v — 0.001541 u + 0.532242 (6a) 


R = 0.004566 y + 0.006259 v — 0.874202 (6b) 
B= 0.004566 y + 0.007911 u — 1.085631 (6c) 
where: 0.0 < R, G, B < 1.0 

16 < y < 235 


16 <v,u < 240 


real, Bl 
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If the inputs of the Digital to Analog Converter are 
scaled to accommodate the nominal input range of 0 
to.219, we obtain the following relationship between 
the inputs to the DVI Technology system, (y, v, u) 
and inputs to the Digital to Analog Converters (r,’g, 
b). Note that all out of range RGB values (>. 255 or 
< 0 due to excursions in the inputs) are clipped to 
255 or 0. oy 


g = y — 0.698001 v — 0.337633 u + 116.56116 (7a) 
r= y + 1.370705 v — 191.45029 (7b) 
b = y + 1.732446 u — 237.75314 (7c) 


where: 16 < y < 235 
16 <v,u < 240 
~~ 0<g,r,b < 255 


By:substitution of Equation 5 into Equation 1, and by 


converting G, R, and B to digital values, we can ob- | 


tain the inverse relationship of Equation 7: 


y = +0.298822r + 0.586816 g + 0.114363 b + 16 (8a) 
w= —0.172486 r — 0.338721 g + 0.511206b +128. - (8b) 


jpv = +0:511545 r — 0.428112 g — 0.083434 b + 128. (8c) 

where: 16 < y < 235 ee 4% 
16 <v,u-< 240 
0<4g,r,b < 255 


- Output Equalization | | 


The units on the 82750DB process the pixel informa- 
tion at the operating frequency of the chip. If the 
output pixel rate is not equal to the maximum fre- 
quency, the units have null states during which pro- 
cessing is suspended. This type of operation is nec- 


‘essary on the 82750DB because of the large - 


~ amount of pipelining. Table 2-7 gives the pattern of 
T-cycles on the 82750DB during which processing is 
active, according to the programming shown in Ta- 
ble 4-2. | 


The pixel information must be output at a rate thatis | 


some sub-multiple of the operating frequency. The 
divisor is programmed by the user, and may be from 
1 to 12 times slower than the period of FREQIN, in 
increments of 1%. Divisors of 13 and 14 are also pro- 
grammable. Because non-integral divisors are used, 
it is necessary for the 82750DB to output different 
information on both phases of FREQIN. This is illus- 
trated in Figure 2-9, which uses a 2.5 divisor for the 
clock. Notice that the pixel clock output (PIXCLk) 


1 On/6 OFf/1 On/7 Off | 
8 | Omron 
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transitions fall alternately on the active and inactive 
phase of the input frequency, while the internal pixel 
clock transitions always occur on the active phase. 
Also note that PIXCLK does not have a 50% duty 
cycle. = 


The equalizing logic derives a clock that has a peri- 
od equal to the programmed pixel rate, providing an 
edge to sample the output information. This allows 
the Digital to Analog Converter to directly sample | 
the output of the pixel data path before performing 
the analog conversion. iy : 


Table 2-7. 82750DB Active T-Cycle Patterns 


Pixel Time Pattern Of Internal 
(T-Cycles) |. Pixel Clock 


1 On/1 On/1 Off 


1 On/1 Off/1 On/2 Off 
| 1 On/2 Off 


1 On/3 Off 


1 On/2 Off/1 On/3 Off 
1 On/3 Off/1 On/4 Off 


1 On/4 Off/1 On/5 Off 


 1On/6Of 


1 On/7 Off/1 On/8 Off 
| = 95s | 1. On/8 OFf/1 On/9 Off 


65 | 
7.5. 
8.5 
9 
9.5 
10 
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Figure 2-9 Divide by 2.5 Pixel Clock 


Digital to Analog Converters 


The Digital to Analog Converters (DACs) take three 
channels of video information output from the pixel 
data path, converting it from 8-bit digital values to 
analog voltage levels typically between OV and 1V. 
The conversion is monotonic, anda pixel clock. is 
used to derive a two-phase clock internal to the 
DAC. The data is sampled from the output of either 


the pixel path, or the YUV to RGB matrix on the’ 
rising edge of the internal active phase of this clock. 


The DISDAC input pin can be asserted to disable the 
analog outputs: and place them into a high-imped- 
ance state. | , - wee 


The analog outputs of the triple DAC are referenced 


to an external current source, which must be con- © 


nected to the IREFIN pin. All the analog outputs are 
scaled by this current reference. The value of the 
analog output full scale is as follows: | 


25 
lfs = lref * a 
18.5 


where: Iref is the magnitude of the reference 
current. : | 


The output voltage generated at full scale is: 


Vis =Ifs * Rext - 
Rext is the load resistance value. 


A typical output load for the analog outputs (RV, BU, 
GY) is 75Q. The speed of the DAC analog output 
rise and fall times is determined by the time con- 
‘stant: ri | 


‘Rext * (Cext + Cout) 
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where: Cext is the external capacitance applied and 
Cout is the intrinsic capacitance of an ana- 
log output. 


For high performance the objective would be to 
minimize Rext and Cext. The voltage Voutfs can be 
determined by any combination of Ifs and Rext, but 
must not exceed 1.5V. In addition Ifs must not ex- 
ceed 22 mA. The analog outputs must go through | 
an external buffer to drive doubly-terminated 75Q 
coax line. — | 


Table 2-8 lists pins which are used to configure the 


triple DAC. _ 


- Table 2-8. Digital To Analog Converter Pins 
ao 


IREFIN Analog Current Reference. Must Be 
| | Decoupled to AVCC. | 


VG@CS Internal Voltage Reference. Must 
Be Decoupled to AVCC. 


NOTE: 


The digital video outputs must be disabled by 
setting DISDIG high whenever the analog out- 
puts are used. Otherwise the A.C. and D.C. char- 
acteristics of the DAC are not guaranteed. 
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82750DB Reset Operation 


Upon power-up, the 82750DB is in an indeterminate 
state and must be reset. The RESETB# signal as- 
serted by the host processor is sampled on the ris- 
ing edge of FREQIN. The 82750DB will enter the 
reset state a maximum of four cycles after 
RESETB# is sampled. The 82750DB will request 
the 82750PB to generate VRAM refresh cycles by 
asserting a REFRESH code on the VBUS for 16 T- 


~ eycles. This code is repeated de 256 T- Bo 


until RESETB# is negated. 


NOTE: 


The RESETB# input is an edge-triggered input. 
After power-up, the host processor must set the 
RESETB# input low for a minimum of ten T-cy- 


cles in order to reset .the 82750DB. The host 
_must then set the RESETB# neu High to start : 


82750DB 


the beginning of a horizontal line and at the begin- 
ning of the first field sometimes referred to as line 1 
of field 1. There will not be a horizontal sync pulse 
on the first line after reset, but HSYNC will be gener- 
ated on every line thereafter. All horizontal and verti- 


_cal programming parameters as well as scheduling 


normal operation. 


When the RESETB# input is released, a Start of 
Vertical Field command (VODD) is sent for 16 T-cy- 
cles to the 82750PB via the VBUS. This code is im- 
mediately followed by a Register Transfer Request 
command (REGX) that is held for 256 T-cycles. This 
256 T-cycle wait assures that the 82750PB has am- 
ple time to honor the 82750DB register transfer re- 
quest. The register data is then read into the 
82750DB from the serial port of the VRAMs at.a rate 
that is equal to 13 of the operating frequency. If the 
register transfer does not terminate after 256 T-cy- 
cles, the 82750DB will automatically stop the trans- 
fer, send an 82750DBSD code to the 82750PB, and 
re-enter the reset state. 


During this register transfer, and on all subsequent 
register transfers (programmed or automatic), the 
82750DB performs a vertical checksum on the regis- 
ter data. The last 32-bit word read in during a regis- 
ter transfer is the user-generated checksum of that 
register data. If the 82750DB-generated checksum 
error does not match the user-generated checksum, 
the 82750DB sends a SHUTDOWN code to the 
82750PB via the VBUS, and will automatically re-en- 
ter the reset state. The 82750DB will remain in the 
reset state until the RESETB# input is toggled by 
the host. processor. Any VRAM requests or control 
signals programmed to occur during this time will be 
ignored. 


Normal programmed operations start after the first 
successful register load. Frame timing will start at 


of any transfer requests and control information to 
be sent on the VBUS must be set up by the user 
during the first register load. Included in the control 
information are parameters for the 82750PB to re- 
fresh the VRAM. Refresh must occur on every line. 
This requires that the line rate of the 82750DB must 
be at least 4 kHz to guarantee that enough refresh 
cycles are generated. Additional register transfers 


(up to one per line) may be programmed to occur on 
any line during the field. As a result of this transfer 
display characteristics and poten Paraingiels 


may be changed. 


After: the first. field, automatic register transfers will 
occur on the second line of each subsequent field. 
Note that all register transfers will occur at 1/3 of 
the operating frequency of the 82750DB, unless.the 


1X or 1/2X SCLK mode has been programmed. by 


the user. 


Throughout the reset process, the states of all out- 


. puts become valid at various times. Specifically, af- 


and FCO are valid. 


ter being held low for at least 10 T-cycles, 
RESETB#: must transition to a high state in order 
to initiate normal operation. By the time RESETB# 
reaches this low to high transition, the states of 
SCLK[1:0], VBUS[3:0], HSYNC, VSYNC, CSYNC, 

10 T-cycles following 
RESETS #’s transition irom iow to high, the states of 
BG, CB, ACTDIS, PIXCLK, DGY[7:0], DRV[7:0], and 
DBUI7:0] become valid. ALPHA[7:0] and BPP[1:0] 
signals reach a valid state 10 T-cycles following the 
completion of the first register load following reset. 


| Input/Output Transformation 


In general, the control outputs, including the sync 


signals, are delayed by pipelining effects from their 
corresponding inputs. If the output sync signals are 
taken as the time base, the first pixel in a line is 
actually fetched by an SCLK that is up to 19 T-cycles 


_ before its corresponding PIXCLK. Some later pixels 


may be delayed by an additional number of T-cycles, 
depending upon bits/pixels, pixel timing, and wheth- 


.er Y interpolation is enabled. 


Outside of the active display region and before the 
blanking output is asserted, border pixels are output. 
Where the blanking region has been entered and the 
display is not active, the output is the aes con- 


| tained in the Blanking Color register. 
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Pixel handling in the active region is defined by three 
parameters: 


1. The bits/pixel parameter. 
2. Whether VU interpolation is in effect or not. 
3. If the 82750DB Enable bit has been selected. 


VU interpolation is in effect for a given pixel if: 


1. The VU interpolator is turned on (VU sample load 
set to non-zero load value), 


AND 


2. VU interpolation display is permitted (VU interpo- 
lation display operations bit equals 1), 


AND 
3. One of the two following conditions is met: 
a. Either the interpolation is unconditional, 
OR 


b. The controlling Y or the controlling U sample 
for this pixel has a least significant bit of 1. 


The value of the alpha output may come from one of 
the following three sources: 


1. It may be explicitly coded into the pixel data (32- 
bit/pixel and pseudo 16-bit/pixel with Alpha 
modes only). . 

2. It may be output from one of two programmable 
registers, AlphaO and Alphat. 


3. During the portion of the display when the border 


is active, the 8 most significant bits of the Border 
Alpha register may be output. 


Table 3-1 illustrates how the Alpha outputs are se- 
lected. 


ile. Table 3-1. Selecting Alpha Outputs | 
Alpha Alpha |. a 
Trap Select Alpha Output ; 
Alpha0 Register 
Alphad Register 
(8, 16 bpp) 
MS Byte of Pixel 
(32,Pseudo16bpp) _ 
1 - | Trap Match = 0, 
AlphaO Register 


Trap Match = 1, 
Alphai Register — 


po 
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Genlocking on the 82750DB 


The genlocking algorithm on the 82750DB uses hori- 
zontal and _ vertical resets, HRESET# and 
VRESET #, obtained from an external device. When 
the Genlock bit in the Miscellaneous Control register 
is off, the 82750DB will ignore all signals present on 
it's HRESET# and VRESET # inputs. The 82750DB 
will resync itself when the programmed end of line 
count is received. This allows the user to turn off 
genlock without having to worry about the state of 
the input video. 


~ When the Genlock bit is set to one, the 82750DB will | 


use the external resets to reset its internal horizontal 
and vertical sync counters. In this case, the width of 
the active line is determined by the HRESET# sig- 
nal, and the length of the field is governed by 
VRESET #. The programmed values for these reg- 
isters will be ignored. As shown in Figure 3-1, 
when asserted VRESET# and HRESET# are ef- 
fected just after the third falling edge of FREQIN. 
VRESET # has no effect on the 82750DB if the first 
half of the first line of an odd field or the second (and 
only) half of the first line of an even field is already in 
progress. HRESET # has no effect on the 82750DB 
if it occurs during the programmed first half of the 
line. The user may decrease the effect of jitter by 
reducing the “window” during which the vertical re- 
set signal is supposed to occur. This can be done by 
scheduling a register load to occur after the vertical 
active display time has ended, thereby decreasing 
the programmable horizontal active window to a size 
acceptable for the video source. When VRESET# is 
received during this reduced, programmed hori- 
zontal active window, the 82750DB is reset to an 
even vertical field. When VRESET# occurs at any 
other time in the horizontal scan line, the 82750DB 
is set to an odd field. 


Sample External Reset Here 


FREQIN 


y 


HRESET# 


VRESET# 


“Figure 3-1. Horizontal and Vertical Reset Timing 


Digitizing Images with the 82750DB 


Digitizing is enabled by setting the Digitize Enable bit 
in the Miscellaneous Control register. Note that en- 
abling the digitize mode does not automatically en- 
able genlocking. The Genlock bit must be set sepa- 
rately, if it is required. When digitizing, the 82750DB 
is used to shift digitized data into the VRAM shift 
registers, and then transfer this data into the oy 
array. 


The 82750DB also provides an external “digitizer 


window” signal, FCO. This signal defines the vertical 


active region that the digitizer enabled. Typically, the 
user sets up the display parameters to reflect the 
“window” of the display to be digitized. The horizon- 


tal and vertical active window size can be selected 
by programming the Active Start and Stop registers. 
FCO is derived from the Vertical Start and Stop reg- 
isters, and is used to enable the digitizer to drive the 
VRAM bus. During the programmed vertical blanking 
interval the FCO signal will be negated, and there- 
fore, the digitizer is prohibited from driving the VRAM 
bus. This will allow data to be read from the VRAM 
serial data bus during the automatic register transfer 
that is performed at the start of the field. Note that it 
will still be possible to program the 82750DB to digi- 
tize during the vertical blanking interval, in order, for 
example, to capture time codes from a VCR. 
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i 3T-Cycles Max , | 
Hsync  “< ea > | \ 
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When capturing, and displaying NTSC data ‘duriaa 
the horizontal blanking interval of the first display 
line, a WRDIGINP command is sent on the VBUS to 
the 82750PB. (Refer to Figure 3-2.) Recall that there 
is a 5-line vertical pipeline delay through the 
82750DB. If the first display line is programmed to 
be n, the first display line will occur atn + 5. Similar- 


ly, if the last line is programmed to be m, then the 


last display will be line m + 5. The WRDIGINP 
VBUS code causes a dummy. write transfer. cycle 
that places the VRAMs in the write mode. The 
82750PB then sets the bitmap pointers to the first 
line’s address (LO). This code is immediately fol- 
lowed by another WRDIGINP command that causes 
the 82750PB to perform a write transfer cycle at the 
LO address. Since no digitized data has been read 


in, invalid data is loaded into row re of the heed 
array. . ta 
inieea the active eplayok of the first display line, the 
82750DB provides shift clocks at the programmed 


pixel rate. The digitized data is shifted into the 


VRAMs while the user-programmed horizontal active 


window is active. During the horizontal blanking in- 


terval of the next line, the 82750DB sends a WRDIGI 


code to the 82750PB, thereby transferring the LO 


data from the shift register to the VRAM array at the 


LO address. The 82750PB performs a pitch calcula- 


tion, pointing it to the L1 row. After the WRDIGI 
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WRDIGINP = WRDIGINP 


linen+4 | | 


FCO Asserted 


A Digitized Dota LO 


line n+5 


WRODIGI 


f 


WRDIGI 


Digitized Data L! 


line n+6 


f 


WRDIGI 


Last Line Of Dato Lf 


line m+5 


_ FCO Negoted — 
line m+6 | | 


Figure 3-2. Digitizing Example 


transfer has finished, the 82750DB 
WDIGINP command to the 82750PB that performs a 
write transfer cycle at L1 address. This will write the 
LO data into the L1 address. The next line the L1 row 
will be written over with L1 data. This same proce- 
dure continues for the entire active display, until the 
last active line is reached (m + 5). A final pair of 
WRDIGI and WRDIGINP codes are sent to the 
82750PB to load in the last line of data. At the start 


issueS a 


of horizontal sync of the next Ine oe FCO peclanel 


will be negated. 


The purpose of the WDIGINP may not be apparent 
at first glance. This signal ensures that the correct 
data is written into the last selected VRAM address. 
This is necessary when crossing the Pelee bound- 
aries of VRAM MEMOny: 


When the 82750DB is -_genlocked, ‘ihe digitizing: 


device must also provide the HRESET# and 
VRESET# signals. The device must ensure that 
_ VRESET# is never asserted during the start of the 
line. This allows a register transfer (which shortens 
' the active display and is required for digitizing) to 
complete before the start of a field register transfer. 
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WRDIGINP Place VRAMS in write made 
Set 82750PB pointer to LO 
WRDIGINP Tronsfer garbage lo LO address 
(Select LO) 


WRODIG! Transfer LO dota to t.0 address 
Set 82750PB pointer to L1 
WRDIGINP Transfer LO to L1 address 
(Select L1) 


WRDIGINP 
L) 


WRDIGI Transfer L1 dota to LI address 
Set 82750PB pointer to 12 
WRDIGINP Transfer L1 to L2 address 
(Select L2) 


WRDIGINP 
fa 


WRDIGI Transfer Lf dato to Lf oddress 
Set 82750P6 pointer to Lf+1 
- WRDIGINP Tronsfer Lf to Lf+1 address 
(Select Lf+1) 


WRDIGINP 
A 
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The vertical sync pulses are buffered, so the start of 
the field transfer request can be honored immediate- 
ly after the previous transfer request is finished. 


Also, captured NTSC data may be displayed on a 
VGA-type monitor. This requires the 82750DB to op- 


erate at a VGA frequency (approximately 31.5 kHz), 


which is twice that of NTSC. Each line of captured 
NTSC data is read into the 82750DB twice. Setting 
the line replicate bit makes doubling of memory un- 
necessary. Figure 3-3 illustrates how the 82750DB 
operates in such a mode. The Line Replicate, Digitiz- 
er, and Genlock bits in the Miscellaneous Control 
register are assumed to be set to one. During the ~ 
HBI of the first display line, a dummy write transfer 
cycle. (WRDIGINP) places the VRAMs in the write 
mode. The 82750PB then sets the bitmap pointers 
to the first line’s address (LO). This code is immedi- 
ately followed by a WDIGINP command, causing the 
82750PB to perform a write transfer cycle at the LO 
address. Since no. digitized data has been read in, 
unknown values are loaded into row LO of the VRAM 
array. : , 


WRDIGINP 


line n+4 


FCO Asserted | 


Digitized Doto LO 


A 


line n+5 


~ WRDIG! 


Digitized Dota LO 


line n+6 


WRDIGINP —- WRDIGINP 


[ digitized doto 1] Dato LI A A he 


line n+7 


WRDIGI 


. Digitized Doto L1} 


linen+8 


WRDIGI. 


WRDIGINP 


A 


line m+5 


FCO Negoted © 


Ae 


line m+6_ 


At the end of the first line the 82750DB sends two 
- WRDIGINP codes to the 82750PB, thereby transfer- 
ring the LO data from the shift register to the VRAM 
array at the LO address. The 82750PB does not per- 


Figure 3-3. Digitizing Example with Line Replicate 


form a pitch calculation, so the pointer remains at. 


the address for LO. After the second display line 
(which has the same data as the first line),.a 
WRDIGI code is sent to the 82750PB that writes the 
LO data to the LO address and updates the bitmap 
pointer to L1. The WRDIGINP signal immediately fol- 
lowing this selects the Li address. After the third 
line of data, two WRDIGINP codes that select 


~ WRDIGINP 


WRDIGINP 


WRDIGINP 


WRDIGINP 


- WRDIGINP 


82750DB 


WRDIGI Place VRAMs in write mode 
Set 82750PB pointer to LO 
WRODIGINP Transfer garboge to LO oddress 
(Select LO) 


WRDIGINP Transfer LO doto to LO address 
(Select 10) 

WRDIGINP Transfer LO data to LO oddress 
(Select LO) 


WRDIGI Transfer LO dato to LO oddress 
Set 82750PB pointer to L1 
WRDIGINP Transfer LO to L1 address 
"(Select L1) 


WRDIGINP Transfer L1 doto to L1 address 
(Select L1) 

WRDIGINP Transfer L1 dato to L1 address 

a (Select L1) 


WRDIGI Tromsfer L1 doto to LI address 
Set 82750PB pointer to L2 
WRDIGINP Tronsfer L1 to L2 address 
(Select L2) : 


\ 


WRDIG! Transfer Lf dato to Lf address 
Set 82750PB6 pointer to Lf+1 
(If WRDIGINP then select row Lf+1) 
WRDIGINP Transfer Lf to Lf+1 address 
(Select Lf+1) 
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the L1 address are sent. After the fourth line, (which 
has the same data as the third line) a write operation 
is performed to load L1 data into the L1 address, 
and the 82750PB pointer is updated to address L2. 
A WRDIGINP code is sent to select the L2 address. 
This same procedure continues for the entire active 


‘display, until the last active line is reached (m + 5). 


A final pair. of WRDIG! and WRDIGINP or. two 


~WRDIGINP codes are set to the 82750PB to load in 
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the last line of data. At the start of horizontal sync of 
the next line, the FCO signal will be negated. 


intel. 
4.0 PROGRAMMING THE 82750DB 


Overview 

All registers are loaded by the issuance of a REGX 
command from the 82750DB to the 82750PB over 
the VBUS. This causes the 82750PB to load a se- 
quence of register values into the VRAM serial out- 
put registers from an address designated by a 
82750DB register pointer. After the request is grant- 
ed, a new 82750DB register word is read in with 
each SCLK. Each 32-bit word consists of a register 
address in the high byte and register values in the 
rest of the word. The sequence is terminated by a 
stop code that corresponds to the address byte be- 
ing equal to Oxff. A variable number of 32-bit words 
can be loaded. During reset, if a stop bit is not found 
within 256 T-cycles, the register transfer is terminat- 
ed, a SHUTDOWN code is asserted on the VBUS, 
and the 82750DB returns to the reset state. All 


transfer requests are terminated at the start of anew | 


field. This ensures that non-terminating register 
transfers caused by bad register data will be halted. 


During this register transfer, and on all subsequent 
register transfers (programmed or automatic), the 
82750DB performs a vertical checksum on the regis- 
ter data. The last 32-bit word read in during a regis- 
ter transfer is the user-generated checksum of that 
register data. If the 82750DB-generated checksum 
error does not match the user-generated checksum, 
the 82750DB sends out a SHUTDOWN code to the 
82750PB via the VBUS, and will automatically re-en- 
ter the reset state. 
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82750DB 


Pipeline Delay through the 82750DB 


The actual horizontal pipeline delay through the 
82750DB is dependent on processing elements 
used to generate the output. If Y interpolation is not 
used, the pipeline delay is: 


Horiz. Active Pipeline Delay = 16 cycles + 
SCLK Transfer Timing Delay 


Here the SCLK Transfer Timing Delay is 1 for 1X, 2 
for 1/2X, and 3 for 1/3X. 


If Y interpolation is used, the pipeline sete is: 


Horiz. Pipeline Delay = 16 cycles + 
SCLK Transfer Timing Delay + Integer (Pixel Time) 


The integer (Pixel Time) is simply the integer value 
of the programmed pixel time. The horizontal pipeline 
delay for blanking differs from that of active. When y- 
interpoloation is on or off, the pipeline delay for hori- 
zontal blanking is: 


_ Horiz. Blanking Pipeline Delay = 10 cycles + 
SCLK Transfer Timing Delay 
The horizontal sync pipeline delay is always equal to 
0 cycles. 


Thus all horizontal parameters, (e.g. horizontal 
blanking start, active stop) must be programmed to 
account for the total horizontal pipeline delay. The 
vertical pipeline delay. The vertical blanking and 
vertical sync pipeline delay are always equal to 0 
lines. All vertical parameters must be programmed 
so that this delay is taken into account. 


Saat, o 
| a iG. 82750DB 
PROGRAMMING CONSIDERATIONS Cursor Control Register — Lees _ Ox5a 


The user must ensure that the 82750DB is pro- 
grammed correctly. Illegal or illogical combinations 
of display parameters are not corrected in hardware, 
and may cause the 82750DB to output erroneous 
display or timing information. The following list high- 
lights some basic guidelines to iglow when pro- 
gramming the 82750DB. 


1. The maximum rate that data may be read into the 
82750DB is determined by the type of memory 
used. This in turn effects the maximum rate and 
depth of data that can be displayed. If 32 bits of 
data can only. be’ read into the 82750DB every 
two clock cycles, only 16 bits of data may be dis- 
played every clock cycle. The programmer 
should match the transfer rate (1X, 1/2X, or 
1/3X) with the memory speed, and the display 
pixel rate with the Pixel depth and memory band- 
width. 


2. Blanking intervals of the display are defined by 
the non-active programmed time. During this por- 
tion of the display, programmed transfers take 


place. If a transfer does not complete before the © 


start of the active display, it is terminated, and 
active display data is shifted into the 82750DB at 
the programmed rate. During horizontal blanking 
intervals, the user should allow enough time for 
all programmed register, colormap, and VU data 
transfers to complete. 


3. When digitizing (capturing) images, no otiier bit- 
map transfers (e.g., REGX,VU) should be sched- 
uled to occur during the active portion of the field. 

. Active start and stop times shouid not be pro- 
grammed to overlap the blanking stop and start 
times, taking the pipeline delay through the 
82750DB into account. 


5. Programming the Y interpolation to occur in a 
non-integral pixel width will cause the Y channel 
to output incorrect data. 


Aw 


' CURSOR REGISTERS 


The following registers are used to program the 
characteristics of the on-chip cursor. 


Cursor Position Update Register Ox5b 


31 24 23 12 11 0 


01011011 ~ Vertical Position Horizontal Position 


— Horizontal Position in units of T-cycles 
— Vertical Position in units of full lines 


This register gives the horizontal and vertical posi- 
tion of the cursor. The cursor will extend 16-pixel 
periods, starting at the prescribed horizontal posi- 
tion, for the next 16 lines. (Or 32-pixel periods for 32 
lines if the 2X Cursor Mode bits in the General Con- 
trol register are set to one. 


31 24 23 ; 12 11 0 


01011010 Vertical Position Horizontal Position 


— Horizontal Position in units of T-cycles 
— Vertical Position in units of full. lines 


This register also gives the horizontal and vertical 
position of the cursor. The cursor will extend 16-pixel 
periods, starting at the prescribed horizontal posi- 


tion, for the next 16 lines. (Or 32-pixel periods for 32 


lines if the 2X Cursor Mode bits in the General Con- 
trol register are set to one.) Receipt of this address 
also causes the 82750DB to interpret the next six- 
teen 32-bit words of register data as the 16 x 16 x 
2-bit cursor map. This will cause the register address 
decoding logic internal to the 82750DB to be dis- 
abled, and the next 16 words of information will be 


. loaded into the Cursor table. Each 32-bit word will be 


interpreted as a line (16 pixels) of cursor data, with 
the two least significant bits corresponding to He 
first cursor pixel to be displayed. | 


Cursor eeelor 3 | | / 0x59 


01 011001 Blue/U Color | Red/V Boiat ‘Green/Y Color. 


If the cursor ie enaciee and the 24 bits of data in this 
register are selected, the data will be sent directly to 
the YUV conversion matrix during active display. The 
bits should be programmed as RGB values when the . 
YUV to RGB matrix is not being used. 


Cursor Color 2 ; — 0x58 


31 24 | 23 16 | 15 8 | 7 0 


01011000 Blue/U Color Red/V Color | Green/Y Color 


lf the cursor is enabled and the 24 bits of data in this 
register are selected, the data will be sent directly to 
the YUV conversion matrix during active display. The 
bits should be programmed as RGB values when the 
YUV to RGB matrix is not being used. 


Cursor Color 1 | 0x57 


31 24 | 23 16 | 15 8 | 7 0 


01010111 | Blue/UColor | Red/V Color | Green/Y Color 


If the cursor is enabled and the 24 bits of data in this 
register are selected, the data will be sent directly to 
the YUV conversion matrix during active display. The © 
bits should be programmed as RGB values when the 


~ YUV to RGB matrix is not being used. 


litte. 


DISPLAY TIMING REGISTERS 


Each register has two, 12-bit components, listed 
with least significant bits first, followed by the 12 
most significant bits. Horizontal timing is measured 
in units of T-cycles (periods of the master clock) 
from the start of horizontal sync. The register con- 
tent defines the number of T-cycles that elapse be- 
fore the event controlled by this register takes place. 
The exception to this rule is the base counter, which 
specifies the number of T-cycles/half line. Zero is 
not an allowable value; use the total number of T-cy- 
cles per half line or full line instead. Unused bits 
should be zero. Sync signals are RESET to initial 
values as specified for each; ‘start’ means to set to 
1, and “stop” means to be reset to zero. | 


Base Counter | 0x56 


31 24 23 12 11 0 


01010110 # of Lines/Field # of T-Cycles/Half Lines 


— T-cycles/Hal Line in units of T-cycles (Periods of the 
master Clock) 
— Half Lines/Field in units of half lines 


As defined by NTSC standards, vertical timing can 
be measured from the start of a field in one of two 
ways: either:in units of half lines, or in units of full 
lines. When programmed for an interlaced display, 
(i.e. an odd number of half lines per field) the start of 
a field coincides with the start of a line on odd fields 
and with the midpoint of a line on even fields. In the 
latter case, for an event that is programmed in full 


lines, the first half line is ignored, and counting be- 


gins with the first full line. With this interpretation, the 
register content defines the number of half or full 
lines that elapse before the event controlled by this 
register takes place. The same may be said for the 


‘horizontal component, which is defined by the num-- 


ber of T-cycles/half line. The hardware does not 
look for nor correct illogical combinations of register 
settings. The monitor should be protected from dam- 
age with external circuitry when debugging is in 
progress. 


All of the internal timing is derived from comparing 
the programmed values with the values of this regis- 
ter. The horizontal base counter is programmed us- 
ing the least significant 12 bits. In this case the val- 
ues loaded into this register should be one less than 
the desired value. Bits 23 through 12 are used to 
specify the number of half lines per field. 
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Sync Stops 0x55 


31 . 24/23 — -: 12; 11 om 


01010101 VSYNC Stop HSYNC Stop 


— HSYNC Stop in units of T-cycles 
— VSYNC Stop in units of half lines 


Sync Starts 0x54 


31 24]23  ——-— «424-44 os: 
— HSYNC Start in units of T-cycles 5: 
— VSYNC Start in units of half lines 


The Sync Stops and Sync Starts registers are used 
in conjunction with one another to specify the start 
and stop locations of the horizontal sync, HSYNC, 
and vertical sync, VSYNC, output signals. VSYNC 
may be programmed to start and stop at any time 
during a given field as defined on a half-line interval. 
Bits 23 through 12 in the Sync Starts and Sync 
Stops registers are used to define the start and stop 
times for VSYNC, respectively. Similarly, HSYNC 
may be programmed to start and stop at any line 
position as defined in units of T-cycles. Bits 11 
through 0 in the Sync Starts and Sync Stops regis- . 
ters are used to define the start and stop positions 
for HSYNC, respectively. 


The horizontal component of the Sync Stops register 
also affects the composite sync, of CSYNC output. In 
this case, the CSYNC output will be the same as the 
HSYNC output, except during the vertical sync and 
equalization interval. In the latter case, the CSYNC 
output is determined by the Serration and Equaliza- 
tion registers. 


Blanking Stops 0x53 


31 24 23 12 11 0 


01010011 Vertical Blank Stop Horizontal Blank Stop 


— HB Stop in units of T-cycles 
— VB Stop in units of half lines 


The Blanking Start and Stop registers control the 
composite blanking output (CB). The horizontal 
blanking start and stop position, in units of T-cycles, 
can be specified to occur at any time during the line. 


By the same token, the vertical blanking start and 


stop positions can be programmed to occur at any 
half-line interval. eo 


Niel. 


The CB output combines both the horizontal and 
vertical blanking pulses programmed using these | 


two registers. This information is independent from 
the HSYNC, VSYNC, and CSYNC outputs, so the 
user must specify the proper blanking intervals for 
the monitor that is being used. If the programmer 
specifies the blanking period to end before the ac- 
tive line starts, or start after the active line has end- 
ed, the border color is output. Due to internal pipe- 
line delays on the 82750DB, the values should be 
one less than desired for VB Start and Stop. For HB 
_ Start and Stop subtract the total horizontal pipeline 
| delay. | | 


~ Blanking Starts 


31° — paleg. Saale = 9 


01010010 Vertical Blank Start | Horizontal Blank Start 


—— HB Start in units of T-cycles Resets to 1 
— VB Start in units of half lines Resets to 1 


Program values one less than desired for VB Start 


and Stop. For horizontal blanking start, load num- 
pals less than the total horizontal pipeline delay. 


Pre-Equalization 


82750DB 


Serration Start bet Peg | . 0x51 


31; : C4 2S ee fe AS TN Oe 0 


01010001 Not Used 


— SER Start in units of T-cycles Resets to 0— 
— (not used) 


The vertical component of the CSYNC (composite 
sync) signal is made up of two types of pulses: 
equalization and serration pulses. The window dur- 
ing which the serration pulses are active, is deter- 
mined by the VSYNC start and stop positions, as 
shown in Figure 4-1. When vertical sync (VSYNC) is 
active, in this case on line 3, the first serration pulse | 
is output on the CSYNC signal. This pulse will start 
at the T-cycle count specified in Bits 11 to 0 of the 
Serration Start register. The pulse will end when the 
half-line count specified in the Base Counter register 
has been reached. This pulse will be repeated for 
every half line that the VSYNC output is pro- 
grammed to be active, regardless of the position in 
the field. In Figure 4- ts this continues until half line 
12, or line 6. 


Pulses Serration Pulses Post Equalization | 
J aaa —— 


Start Of Odd Field 


Horizontal Equalization Stop 


Vertical Serration Start 


7 


Line Count 


Vertical Equalization Stop 


y 


240855-16 


Figure 4-1. Programming the Video Sync Outputs 
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ie. 82750DB 
Equalization Parameters 0x50 Active Region Starts Ox4e 
31 2423 12111 0 31 24|23 12|11 0 


0101000 0|Vertical Equalization Stop|Horizontal Equalization Stop 


— EQH Stop in units of T-cycles Resets to 1 
— EQV Stop in units of half lines Resets to 1 


During the vertical equalizing period, which starts at 


field-beginning, an equalization pulse is output on 
the CSYNC signal at the beginning of each half line, 
as shown in Figure 4-1. The width of this equaliza- 
tion pulse is determined by the value in bits 11 to 0 
of this register. The half line on which these pulses 
are to stop is programmed in bits 23 through 12 of 
this register. If VSYNC is programmed to occur dur- 
ing the equalization interval (as it is for NTSC type 
displays), the serration pulses are output on the 
CSYNC signal. 


Active Region Stops Ox4¢ 


31 24} 23 12/11 0 


01001111 Vertical Active Stop. | Horizontal Active Stop 


— Actdis Stop in units of T-cycles 
— Vertical Stop in units of full lines 


The active region window, during which pixels to be 
displayed are fetched from VRAM, is defined by the 
Active Region Start and Stop registers. The first dis- 
play line is actually five lines after the line indicated 
in the vertical region of the Active Region Start regis- 
ter. The position of the active region on a horizontal 
line is determined by the horizontal component of 
the Active Region Start register. Pixels will be 
fetched from VRAM at a rate determined by the 
number of bits/pixel and pixel widths. In order for the 
82750DB to operate properly, the horizontal width of 
the active region window must be an integral number 
of display pixel widths, taking into account the hori- 
zontal pipeline delay. Also, the Active Region Start 
and Stop must fall within a single line boundary, as 
dictated by the Base Counter register. When the first 
pixel actually appears at the output of the 82750DB, 
the output is a function. of the processing elements 
used as discussed above. ae ae 


When the active region is over, the border color is 
output until the programmed blanking time is 
reached. Both the border and-blanking information is 
output at the transfer rate programmed by the user. 
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01001110 Vertical Active Start |Horizontal Active Start 


— Actdis Start in units of T-cycles 
— Vertical Start in units of full lines 


Burst Gate Stop. 0x4d 


“31 24 | 23 121 11 0 


01001101 Vertical BG Stop Horizontal BG Stop 


— Horizontal Stop Position in units of T-cycles 
— Vertical Stop Position in units of full lines — 


The Burst Gate Horizontal and Vertical Start and 
Stop registers allow the user to program a window 
into which burst can be added. This is useful when 
modulating the outputs of the 82750DB. 


Burst Gate Start Ox4c 
31 24 | 23 | 42 | 114 | ) 
01001100 Vertical BG Start Horizontal BG Start 


— Horizontal Start Position in units of T-cycles 
— Vertical Start Position in units of full lines 


VBUS CODE REGISTERS _ | 
The following group of registers are used by the pro- 
grammer to schedule when VBUS transfer or control 


codes are to be sent to the 82750PB by the 
82750DB. 


Display Format Load Interrupt — 0x4b 


31 24/23 12/11 


~ 01001011 Vertical DFL Position |Horizontal DFL Positio 


— Horizontal Position in units of T-cycles 


— Vertical Position in units of full lines — 


This is the programmable XY interrupt, used by the 


82750PB to perform a load of the Shadow Copy reg- 
isters. This interrupt is sent on the VBUS when the 
bits 23 to 12 match the current display line position, 
and bits 11 to 0 match the T-cycle count. 


intal. 82750DB 
_ Line Notification Timing Ox4a Aipha Register 0x47 
31 24 23 0«12 1 0 31 24 | 23 16] 15 | 8|7 0 


01001010 Not Used _ Horizontal HLIN Position 


— HLIN timing in units of T-cycles 
— Not Used 


This indicates the position on each line to send a 


HLINE code on the VBUS. The 82750PB requires 
this information to keep track of the current Bisplay 
line when drawing graphics. 


Refresh and Register Transfer 0x49 


31 24.1 23 12 | 11 0 


01001001 | REGX Line Number }- Refresh Horizontal Position 


— REFRESH horizontal timing in units of T-cycles 
— Register Transfer Line number in units of full lines — 


When the T-cycle count matches the value pro- 
grammed into bit 11 to 0 of this register, a refresh 
code is sent to the 82750PB. Since these codes tie 
up the 82750PB for at least eight 82750PB cycles, 
the programmer must ensure that no transfer re- 
quests are scheduled to occur during this time. 


The line number for the next register transfer is 


01000111 | Border pips -Alphai Register | AlphaO Register 


The least significant 8 bits are for the ALPHAO regis- 
ter and are used during blanking and if the alpha trap 
value is not matched. The next 8 bits are for the 
ALPHA1 ‘register when the alpha trap value is 
matched. The most significant 8 bits provide the al- 
pha channel value during the border time. 


Blanking Color 0x46 


31s a | 23 16] 15 817 0 


01000110 | Blue/UColor | Red/V Color | Green/Y Color 


The 24 bits of data in this register are sent directly 
through the YUV conversion matrix during the pro- 
grammed blanking time. — 


CONTROL REGISTERS 


. The following registers are used to define the _oper- 


specified in bits 23 to 12 of this register. If pro- — 


grammed to occur, REGX will always be the first 
transfer request sent to the 82750PB, meee’ 
after the end-of active aepiey- 


COLOR REGISTERS | 
The following registers specify the state of DBU, 
DRV, DGY, and ALPHA signals during the field. 


Border Color | | 0x48 


31 24 | 23 16; 15 8 | 7 0 


| 01001000 | Blue/UColor | Red/V Color | Green/Y Color 


The 24 bits of data in this register are sent directly to 
the YUV conversion matrix during border time. Bor- 
der time is defined as the region in which neither 
active display nor blanking is programmed to occur. 
The bits should be programmed as RGB values 
when the YUV to RGB matrix is not being used. 
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ating modes of the: 82750DB. 


Pixel control | = 


23 22 21 19 18 | 44 13. " 10 9 8 76 


| fers - SCLK Delay 


v VU Interpolation Round 
Conditional Interpolation Enable 
VU interlace Enable ‘ 
’ 4x VU Expand 


VU Sample Select - 
Pixel Time 


Bits/Pixel- — 
VU Pixel Replicate 
Pseudo 16-Bit Mode — 


240855-17 
Bits 6:0—SCLK Delay 


The number ‘“‘m” of T-cycles from initiation of a 
transfer Bae on the VBUS until the first SCLK is 
asserted by the 82750DB. 


82750DB 


Bit 7—VU Interpolation Round Bits 18:14—Pixel Time 
When equal to 0, this bit means truncate during in- Table 4-2 lists the codes and pixel duration for bits 
terpolation. When set to one, this bit means round to 18:14. : | outs 


odd during interpolation. : 


Table 4-2. Pixel Times 


Bit 8—Conditional Interpolation Enable 


When reset to zero, this bit means all values of Y 
and U are a full 8 bits of precision. When set to one, 
this bit means the least bit of the Y sample or the U 
sample controls the switching between VU interpola- — 
tion and graphics mode. 


Bit 9—VU Interlace Enable 


Setting this bit to a one causes the interpolator to 
output different data on the odd and even fields. 
During the odd field, the odd lines of the interpola- 
tion sequence will be output. During the even field, 
the even lines of the interpolation sequence will be 
output. Full lines of the programmed number of sam- 
ples of both the V and U data will be read in during 
each VU transfer. Setting this bit to a zero will cause 
horizontally and vertically interpolated data to be 
output on both fields. Only a full line of either V or U 
samples will be read in during each transfer request 
in this mode. | 


Bit 10—4X VU Expand 


When this bit is set to a zero, a 2X expansion in both 
directions is performed. By setting this bit to a one, a , 
4X expansion is performed. ; 


Bits 13:11—VU Sample Select 


Table 4-1 provides the code and number of V and U 
samples for bits 13:11. 


Table 4-1. VU Sampling 


Number of V And U Samples 
OSamplesforEachVandU , | 


1-39, 


, | 
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Bits 21:19—Bits/Pixel 


Table 4-3 provides the code and number of bits/ pix- 
el for bits 21:19. 


- Table 4-3. Number of Bits/Pixel 


Number of Bits/Pixel 


001 — 


ee 


100 . 


Bit 22—VU Pixel Replicate 


Ce ee 


When set to one, each pixel generated ") the VU. 


Interpolator is held for 2-pixel times. This allows an 
effective 8X expansion of VU data. This is useful for 
high resolution applications where the blanking time 
is not ee pupben higher VU sample loads. 


Bit 23—Pseudo 16-Bit Mode 


When set to one and 16 bits per pixel is chosen (bits 
21:19), the 82750DB is in the 16-bit with Alpha 
mode. Setting this signal to zero while in the 16-bit/ 
pixel: mode puts the 82750DB into the 16-bit (655) 


mode. This bit represents a “don’t care” input for all 


other values of bit/pixel. 


General Control 


1312 11109 8&8 6 § 4 


Vv 
' ee Burst Multiple 
Cursor Enable 
2x Horizontal Cursor 
2x Vertical Cursor 
_ Channel Test Select 
Sync Test 
Gren 
' Viden 


__ Reserved - Set To Zero 


Vblen 
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Bits 4:0—Burst Multiple 


These bits are used to program a divisor of the 
FREQIN clock input in order to recover the 
3.58 MHz NTSC color subcarrier. The programmed 
value is the two’s complement of the desired divisor. 
The allowed range of values is 00001 through 11111 
which corresponds to divisions of 31 through 1. Note 


0x44 


| Alpha Channel 
4 ar Sa oe 
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Bit 6—2X Horizontal Cursor 


When this bit is set to one, and the Cursor Enable bit 
is set to one, every pixel on each line of the cursor 
will be replicated once. Thus a cursor that was 
16 x 16 pixels will become 32 x 16 pixels. 


Bit 7—2X Vertical Cursor 


When this bit is set to one, and the Cursor Enable bit 
is set to one, each line of the cursor will be replicat- 
ed once. Thus a cursor that was 16 x 16 pixels will 
become a 16 x 32-pixel cursor. 


Bit 9:8—Channel Select 


These two -bits control which output channel is 
muxed onto the alpha digital outputs. It allows Y, U, 
or V data to be available at the alpha channel. The 
coding is provided in Table 4-4. 


Table 4-4. Test Mode Select Coding . 


a =e Channel Output | 


Y Channel 


Bit 10—Syne Test 


This bit must be set to zero for proper aperation” | 


that the 82750DB must be operating at an integer ~ 


multiple of 3.58 MHz for this to work effectively. 


Bit 5—Cursor Enable 


When set to one, the hardware cursor will output the 
cursor data at prescribed intervals if programmed to 
do so. . 
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Bit 11—Gren 


This is the Graphics Enable bit for the Y Interpolator. 
When this bit is set to one and the pixel is a graphics 
pixel, switch is zero, a 2X interpolation will be per- 
formed on the pixel. 


Bit 12—Viden 


This is the Video. Enable bit of the Y Interpolator. 
When this bit is set to one and the pixel is a video 
pixel, switch is one, a.2X interpolation will be per- 
formed on the pixel. 


Bit 16: 13—Vblen 


These bits program the T-cycle length of each VBUS 
code. The VBUS code length will be one T- -cycle 
longer than the programmed value. These bits must 
have a minimum value of 2, anda maximum value of | 
15. 


a 

intel. 
Miscellaneous Control 0x43 
23 22 21 20 19 18 17 16 15 14 13 12 11109 8 7° : 0 


me 


Alpha Trap Setect 
Line Border Alpha Enable 
Replicate Digitize Enable 
Enable VU Interpolator Output Enable 


Vv 
Reserved 


(write as zero) 


Y 


Alpha Trap Value 


Alpha Enable 
Switch on LSB OF Y 
Genlock Enable 
Bypass Conversion Matrix 
Split CLUT 
Graphics Pass 
Video Pass 


Transfer Timing | - 
Select 
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Bits 7:0—Alpha Trap 


Bits 7:0 are 8-bit values used for comparison with 
the current pixel’s Y value, to select one of two pro- 
grammable alpha values. 


Bit 8—Alpha Trap Select 


A value of one enables the Y value of the current 
pixel to be compared with the value in the Alpha 
Trap register. If the two values match and Alpha has 
been enabled via the Alpha Enable bit, the contents 
of the ALPHA1 register are output on ALPHA[7:0]. If 
the two values don’t match and Alpha Enable has 
been set to one, the content of the ALPHAO register 
is output. When Alpha Trap Select is set to a zero in 
the pseudo 16- or 32-bit mode, the most significant 
-byte of the pixel word is output. When Alpha Trap 
Select is set to zero in all other modes, the value of 
the ALPHAO register is output. ~ 


Bit 9—Border Alpha Enable 


A value of one enables the eight most significant bits 
in the ALPHA register to be output. When set to a 
zero, the ALPHAO register is output during border 
time. | a 


Bit 10—Digitize Enable 


When this bit is set to a one, the FCO signal will be 
set to a one, and the transfer codes for bitmaps will 
indicate that write operations should occur. 


Bit 11—VU Interpolator Output Enable 
This bit enables VU interpolation data to be dis- 


played. When set to a zero, all pixels are treated as 
graphic pixels. 
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Bit 12—Alpha Enable © 


When set to one, the alpha output is governed by 
the alpha trap value, as described above. When re- 
set to zero, the contents of the ALPHAO register is 
the alpha output in the 8- and 16-bit modes, and the | 
explicit ALPHA data encoded in the pseudo 16- and 
32-bit modes. | 


Bit 13—Switch on LS Bit of Y 


When set to one, the least significant bit of Y is used 
as a Video/Graphics switch in all modes. When re- 
set to zero, the least significant bit of U from the 
interpolator acts as a switch. 


Bit 14—Genlock Enable 


This bit enables the genlock mode of the 82750DB. 
In this mode, receipt of the external HRESET# sig- 
nal during the second half of a scan line will cause 
the termination of that scan line. Similarly, receipt of 
the externally produced VRESET # signal will termi- 
nate the field. In both cases, terminate denotes that 
the proper on-chip signals are produced to signify 
end of the line and end of the field. - , 


Bit 15—Bypass Conversion Matrix 


When this bit is set to a.one the YUV to RGB matrix 


will be bypassed, and the Y, U, and V data will feed 


directly into the Digital to Analog Converters. 


Bit 16—Split CLUT 


This bit divides the CLUT into an odd and an even 
half, depending on the polarity of the Video/Graph- 
ics switch. This switch is selectable and may be ei- 
ther the LSB of U from the interpolator or Y from the 
pixel word. The LSB of the CLUT address is set to 
one (odd address) if the Video/Graphics switch is 
one; the LSB of the CLUT address is set to zero 
(even address) if the Video/Graphics switch is zero. 


Bit 17—Graphics Pass 
Setting this bit to a one bypasses the CLUT for 
graphics pixels, even in non-mixed modes. — 


Bit 18—Video Pass 


When set to a one, all video pixels (luminance val- 
ues associated with sub-sampled UV values) will by- 
pass the color table. For mixed modes, this corre- 
sponds to the switch flag having a value of one. 
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Bit 20:19—Transfer Timing Select 


These bits are two-bit.codes that select one of three 
possible transfer shift clock rates. This. allows the 


82750DB 


Each of the three 8-bit registers: contains the bit pat- 


~ tern used when the corresponding pt in the Mask 


operating speed of the 82750DB to be tailored to the ~ 


external memory access time.. After RESET, the 
transfer rate is set to the slowest possible clock rate 
- (1/3X). The programmed rate is used during all.non- 


active display times. for transferring data from 


_ VRAMs. It also defines the rate that the border and 
blanking data is output. During active display, the 
data is read as needed from VRAM using the pro- 


, grammed timing. The coding of these bits i is listed i in 
7 Table 4-5. | 


Table 4-5. Coding of Transfer Timing Select Bits 


1/3X Transfer (Default) | 


legen 1/2X Transfer. 


1X Transfer_ 


Bit 21—82750DB Enable | 


When set to zero, the 82750DB will Ibe the register | 


equivalent of a 82750DA. When set to a one all the 
features of the eve willbe enabled. 
Bit 22—Line Replicate Enable 


When this bit is set to one, every ine? in vihe active 
display is generated twice: Each new bitmap transfer 


eee Not Used _ |YUVCLUT Index 


Set register is asectien 


Mask Set Registers. 0x41 
31 24} 23. 16] 15 B17 0 


01000001 | Blue/U Color Red/V Color Green/Y Color 


This is a 24-bit register. that contains the mask bit 
pattern for the RGB/YUV color map addresses. 


~When a bit in this register is asserted, the corre- 


sponding bit in the address is set to the value de-. 
fined in the Mask Data registers. 


ae Index Register — 0x40 


24|23° | ahs : 


The CLUT Index egies is an 8-bit:register used for 
loading the color tables. This register maps the user- 
specified 6-bit color map address into an 8-bit ad- 
dress: A logical OR operation is performed between. 
the 6-bit address and the’ 8-bit index word to obtain | 


: the new core address. 


occurs at half the line rate, with a new VBUS code: 


being used to indicate that a transfer is to take place 
without the pitch calculation. The VU Interpolator. will 
also duplicate the lines it generates, yielding. more 


ae aero Addresses 0x00-0x3f 


If the 82750DB, Enable ‘mode bit in the Miscellane- 


ous Control register is set to. zero, the CLUT ad- 
dresses are decoded to appear as addresses to the 
reduced-size 82750DA color table. The least signifi- | 


_ cant four bits of the address are used for the Y color 


time between transfer cycles. This mode.is useful for 


obtaining a 2X increase in vertical resolution without 


the need for increasing the VRAM transfer band- | 


width. 

COLOR MAP REGISTERS ~ | 

The following registers are used to access and con- 
trol the ree 256 x 8-bit or Lookup T Tables. 
Mask Data a agiiert ae : ‘Onde 


31 24 | 23 16 | 15 B74 ve * 0 


01000010! Blue/U Mask Data} Red/V Mask Data Green/Y Mask Data 
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table address, and the upper nibble is used to ad- 
dress the V and U color table simultaneously. This is 
a compatibility mode for the 82750DA, wach has a 
reduced- size color table. : 


31028 24 23 16 15 8 


UV Address | _Y Address 


If the 82750DB Enable mode bit is set to one, the full 
color table is used. In this case, the most significant 
byte of the 32-bit data word is used as an address to 
the color table. The address is ORed with the most 
recently loaded CLUT Index register. 


31° 30 24. .| 23 16 |. 15 8 


Cee [wre [vo [vee [ram 
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82750DB Register Summary 


82750DB 


The following table illustrates the register space of the 82750DB. 


82750DB Register 
0x00-0xOf CLUT Locations0-15 


CLUT Location 49 


-CLUT Mask Data Register 
Miscellaneous Control 7 


Pixel Control 


Register Transfer 


_ Line Notification and Timing - 


0x10-0x30 CLUT Locations 16-48 


| 0x44 | General Control 
Blanking Color | 


- Table 4-6. 82750DB Register Space 
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[eee [Not used 


82750DB Register 


ec 
[—oxri-oxrt | Notused id 


nm, 8 
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5.0 ELECTRICAL DATA 


Maximum Ratings 


Table 5-1 is a stress rating only, and functional operation | Exposure to the Maximum Ratings may affect device 

at the maximums is not guaranteed. Functional operat- reliability. Furthermore, although the 82750DB con- 

ing conditions are given in the DC and AC Characteris- tains protective circuitry to resist damage from static 

tics (Tables 5-2, 5-3, 5-4, and 5- iis electrical .discharge, always take precautions to 
| : - 3 avoid high static ees or electric fields. 


Table 5- 1. Absolute Maximum Requirements 


Maximum 
Requirement 


Case Temperature under Bias : —65°C to110°C 
Storage Temperature 3 oe dL = 65°C tO 110°C | - 


Voltage on Any Pin with Respect to Greund —0.5V to Voc + 0.5V 


‘Supply Voltage with Respect to Vss | -0.5V to +6.5V 


DC Characteristics 


Condition 


Table 5-2. De Characteristics Ved = 5V 10%, Tonge = 0°C to 95°C | 


ve PoP , 
eC oe 


loz 


loot. 


0.4 ape lo, = 4.0 mA” 


“VegeV, aes 


Te sas — 


[aie [ane as 


NOTES: . 

1. Measured with FREQIN = 7 MHz. 

2. Typical current value measured under typical conditions with the Digital Bites (ay, DRV, and DBU) cbing: Maximum 
current value guaranteed with 50 pF maximum output loading. Analog Outputs disabled. 

3. Typical current value measured under typical conditions with the Digital Outputs (DGY, DRV, and DBU) not toggling. 
Maximum current value guaranteed with 50 pF maximum output loading. Analog Supply Current [ACC not included. 

4. Not 100% tested. | | 
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AC Characteristics | | 
Table 5-3. AC Characteristics at 28 MHZ Vog = 5V +10%, Toagge = O°C to 95°C, C, = 50 pF 


Syma [wax [Unit | Figure 
Note 
Note 1 

FCO Valid Delay 

DISDIG, TESTACT Setup 

DISDIG, TESTACT Hold | 


t 
t 
t 
t 
t 
t 
t 


1 
2 . 
3 
4 
5 ; 
6a 
6b 
b 
ty 


—h. 


ooh ok (ee) 


1X Mode : 
1X Mode | 
1/2X, 1/3X Mode 


SCLK[1:0] Valid Delay 
DATAIN{[31:0] Setup 


tg 

to 
ty 
tio 


. Plea ees 
Pane ‘ : 


| 5-4: 5-5, 5-6 


a ; 
oe 
N 
— 
Zz. 
Oo 
—_ 
14) 
Ae 


Zz 
2 
© 
— 


ALPHAI7:6}ACTDIS, CB, BPP(0}, 
BPP[1] Output Hold 


VBUS{3.0], SCLK[1.0], FCO, 
HSYNC, VSYNC, CSYNC, CB, BG, 
PIXCLK, DRV[7:0], DGY[7:0], 
DBU[7:0], ALPHA[7:0], ACTDIS, 
BPP([0], BPP[1] Float Delay 


tig DISDIG, DRV[7:0], DGY[7:0], 
DBU[7:0], Digital Output | 
Disable Delay 


too DISDIG, DRV[7:0], DGY[7:0], 
DBU[7:0], Digital Output 
Enable Delay 


DISDAC, RV, GY, BU Analog 


: 19 5-1 
Output Disable Delay ee 
DISDAC, RV, GY, BU Analog 19 ns _ 5-11. | (Note 6) 
Output Enable Delay | | 
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NOTES: 
1. This assumes a 35 ns pene For other speeds, the FREQIN High and Low Times should fall within a 40% to 60% duty 
cycle. 
2. For integer pixel times ty , is the Valid Delay on all assertions of PIXCLK during’ active display time. 
3. For non-integer pixel times t,.. is the Valid Delay on alternating assertions of PIXCLK during active display time. 
4. Not 100% tested. 
5. All A.C. specifications are measured at the 1.5V crossing point with a 50 pF load. . | 
6. Analog output delay | is measured at the 50% level of the full scale transition with R, = 75Q and C, = 25 pF. - 


AC Characteristics 
Table 5-4. AC Characteristics at 45 MHz Voc = 5V +10%, Tease = 06 to 95°C, C, = = 50 Dis 


[Bymber[ Parameter Tin [wax [unt [Figure [notes 
( Treaeney——*d | ne] Rk 
CE 9 A IS 
fe FREGWHen Tine ———S«d 7 | tS fw | (Nowe) 
LET A BNR 


HSYNC, VSYNC, SYNC, BG, . 


on 
a. 


FCO Valid Delay 


| VBUSJ[3:0] Valid Delay - ae 


RESETB#, VRESET#, HRESET#, 
DISDIG, TESTACT Setup © ; 


RESET B#, VRESET#, HRESET#, 
DISDIG, TESTACT Hold 


SCLK([i:0] Valid Delay Low : 
SCLK[1:0] Valid Delay 
DATAIN[31:0] Setup 
DATAIN[31:0] Hold 
PIXCLK Valid Delay 

' PIXCLK Valid Delay 


DRV{7:0], DGYL?. 0}, DBUI: 0}, 
ALPHA(7:0], ACTDIS, CB, BPP[0], 
BPP[1]/VUGR Output Hold 


> afd : : 


VBUS[3.0], SCLK[1.0], FCO, 
HSYNC, VSYNC, DRV{[7:0], 
DGY([7:0], ALPHA[7:0], ACTDIS, 
BPP[0], BPP[1]/VUGR Float Delay 


DISDIG, DRV[7:0], DGY[7:0}, 
| DBU[7:0], Digital Output . 
‘Disable Delay 
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AC Characteristics (Continued) 
Table 5-4. AC Characteristics at 45 MHz Vo, = 5V + 10%, erase = = 0°C to 95°C, C, = 50 pF 


DISDIG, DRV[7:0], DGY[7:0], 
DBU[7:0], Digital Output 
Enable Delay 


DISDAC, RV, GY, BU Analog 


Output Disable Delay 


DISDAC, RV, GY, BU pre ; 
Output Enable Delay : 


1. This assumes a 22 ns period. F | ; éds, the FREQIN High and Low Times should fall within a 40% to 60% duty 
cycle. tae _ 

2. For integer pixel times t,, is the: Valid! ‘Belay on all assertions of PIXCLK during active display time. 

3. For non-integer pixel times t,, is the Valid Delay on alternating assertion's of PIXCLK during active display time. 

4. Not 100% tested. 

5. All A.C. specifications are measured at the 1.5V crossing point with a 50 pF load. 


6. Analog output delay is measured at the 50% level of the full scale transition with R, = 75Q and C, = 25 pF. 


FREQIN 
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Figure 5-1. Clock Waveforms 


FREQIN 


<<— 


\ 
| 
\ \ 
| 
] . 

: Cis 


Figure 5-2. Output Waveforms 


FREQIN | 1.5V 
| 
: l 


| t6a, t6b 


240855-21 


 240855-22 


Figure 5-3. Input Waveforms | 
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~FREQIN 


SCLKI[1:0)] 


-DATAIN[31:0] | 
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FREQIN 


SCLKI1:0] 


DATAINI3 1:0} 
240855-24 


FREQIN 


SCLK[1:0] 


DATAIN[3 1:01 
: 240855-25 


Figure 5-6. 1/3X SCLK Mode ~ 
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FREQIN 


15) 


( | 
PIXCLK i 1.5V \ 
I t | 
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PIXCLK 


240855-27 


 240855-28 


Figure 5-9. TESTACT # Float Delay 
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DISDIG 


DRVI7:0) 
DGY{7:0) 
DBUI7:0) 


- 240855~38 


mca eee, 
'x<—$§ p> <p 


N INDICATES HIGH-IMPEDANCE STATE 


240855-39 
Figure 5-11. DISDAC to Analog Output Delay 
Digital to Analog Converter Electrical Characteristics 
Table 5-5. DAC D.C. Characteristics oe = 5V + 10%; Tease: O°C to +95°C 


lref Reference AE 

| Current | oe 
| Output Current* | 0. 93 * (255/18.5) * lref AOF mA | (Note 1) 
(Full Scale) s 
Output Voltage 1.027 
(Full Scale) | ek 
Integral 243 LSB 
Nonlinearity ig 4 af Pt ' 
Differential S, > LSB 
Nonlinearity 

IACC Analog Supply » (Note 2) 

Current 


DDTR | DAC to DAC eB Ge Pe. 5.0 % | (Note 3). 


Tracking at Full 
12 (Note 4) 


3" lfs+8 


NOTES: 

1. Maximum Ifs allowed = 22 mA. 

2. Maximum IACC allowed = 74 mA. Typical value of IACC = 3* Ifs + 6 

3. Maximum deviation between RV, GY and BU outputs at fullscale output voltage. 
4. Not 100% tested. 

5. All DAC testing done with lref = 1500 pA. 1-50 
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Table 5-6. DAC A.C. Characteristics 


[owe | CiockFeeavough | 


NOTES: | as! 3 
1. Maximum value is for R, = 75Q and C, = 25:pF. Defi 
2. Assumes an 80 MH filter on output. ..%" 8 

3. Glitch energy generated from the infti@nte that 2 activé outputs have on an idle output. 
4. DISDIG must be tied high. a 

5. Assumes the use of 0.1 uF capacitor between VGCS and AV,, and 0.1 uF and 10 uF capacitors between IREFIN and AV 


10% to 90% fullscale transmission. 


oO 


IREFIN 


IREFIN 


Ground. 
Vecs 


82750DB Ayes aed 


2 +5V (Avcc) 


To 
Monitor 


R, = 75Q 

R, = Load Resistance 

C, = 0.1 pF 

Co = 10 pF 

C, = Load Capacitance 
ifs = 208 * Iref 


Vis = lis *R, 


where: | 
0 <lout < Ifs 
0 <Vout < Vis 


Tr = Tf = 3*R,.(C, + Coury ) 


240855-29 


Figure 5-12. Typical Output Configuration 
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Output Delay and Rise Time versus Load Capacitance - 


Typical 
Output 
Delay 

(ns) 


NOTE: 
This graph will not be linear outside of the C, rang 


Figure 5-13. Typical Output Valid 


Rise 
_ Time (ns) | 
6.8V - 2.0V 3 


25 50 75 100 125 150 
C, (picoferads) 240855-31 


NOTE: | : 
This graph will not be linear outside of the C, range shown. 


Figure 5-14. Typical Output Rise Time versus Load Capacitance under Worst Case Conditions 


! 
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6.0 MECHANICAL DATA 


Packaging Outlines and Dimensions 


Intel packages the 82750DB in a Plastic Quad Flat 
Pack (PQFP). Table 6-1 gives the symbol list for the 
PQFP. 


Table 6-1. PQFP Symbol List 


Sincsl Description of Dimensions 
A 


Package Height: Distance from 
Seating Plane to Highest Point of 
Body : 


[Dues | Foot Recs Localon 


Foot Length | 
Total Number of Leads 


The PQFP has the following specifications: 
1. All dimensions and tolerances conform to ANSI 
Y14.5M-1982. 


2. Datum plane-H-is located at the mold parting line 
and coincident with the bottom of the lead where 
lead exits plastic body. 


[@ 18.28 388) [c|A@-8@ 109 | 


mm (inch) | 


A, Standoff: Distance from Seating | 
Plane to Base Plane 

D/E Overall Package Dimension: Lead ] 
Tip to Lead Tip 


82750DB 


3. Datums A-B and -D- are to be determined where 
center leads exit plastic body at datum plane -H-. 


4. Controlling dimension is the inch. 


. Dimensions Dy, Do, Ey, and Eo are measured at 
the mold parting line and do not include mold pro- 
trusion. Allowable mold protrusion is 0.18 mm 
(0.007 in.) per side. 


Oo 


| 6. Pin 1 identifier is located within one of the two 


zones indicated. 
7. Measured at datum plane -H-. 
8. Measured at seating plane datum -C-. 


Table 6-2 provides outline characteristics for 
0.025-in. pitch. 


Table 6-2. Intel Case Outline Drawings 
for PQFP at 0.025 Inch Pitch 


[Symbol] _Deseripion [Win [Wor 


a [Sendo | _oua0 | 0010 
DE [Temi Divenson | 1070 | 1060 
[05 €, [euperDsance | oer [1100 
; 


Foot Radius 1.023 1.037 
Location 
Foot Length 1 0.020 0.030 


re BASE PLANE 
be A} 


— 
~_ 
— 
—. 
pawad 
——i- 
~— 
pay 
— 
—, 
— 
~— 
~~: 
—— 
~ 
— 


> 


-C-ISEATING PLANE 
|CD] 8.18 (884) 
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Figure 6-1. Principal Dimensions of the 82750DB in the 132-Lead PQFP Package 


1H PUES 82750DB 


Le fa.13 (285) @ [c[A@-39 JOO VA 


| 8.41 (.816) 
@.28 (088) 
. ex 


B.3L (.812) bo 
6.28 «.988)  om——— 34/64 


| [fa.26 «aaa 


8.14 ¢.88S) 


. L@ 18.28 (288) O 1c] AS -8© |0@ | J 8 0E6 
a DETAIL U DETAIL LL 
sa le | | 240855-36 


Figure 6-2. 132-Lead PQFP Mechanical Package Detail—Typical Lead 


3 : 1.32 (852) 
mo 1,22 (848) 


8.99 (.835) MIN. 


(1.32 (.852) 
"1.22 6.848) lL . 
2.83 (.888) 
7 8.98 (.8355) nn — Ti93 (1876) 
2.85 (.888) . 
«1.93 €.876) 


8 _ DETAILM 240855-34 | 
mm (inch) b ha V2 an Oi 


. Figure 6-3. 132-Lead PQFP Mechanical Package Detail—Protective Bumper 


02 | 18.25 (.818)@ DOA 
| | £882 MM/MM CIN/IN)[A-B) 


a (S [8.25 (818) @ClA®-8© OA 
7 .882 MM/MM CIN/IN)|A-B] 


m 5.81 (.158) MAX TYP. 


SEE DETAIL M 


AM) | 
1.91 (.875) MAX TYP 


| @ [8.25 ¢.818)@ |C|A@-8@ [0@ | 
Lada! 882 MM/MM ¢IN/IN) [0] 
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Figure 6-4. Detailed Dimensions of the 82750DB in the 132-Lead PQFP Package—Molded Details 
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inl. | 82750DB 


7 8.635 (8.825) 


SEE DETAIL L 
SEE DETAIL J 


240855-35 
mm (inch) 


Figure 6-5. Detailed Dimensions of the 82750DB in the 132-Lead PQFP Package—Terminal Details © 


NOTES: 
ALL OIMENSIONS AND TOLERANCES CONFORM TO ANSI Y14.5M-1982 


DATUM PLANE EHJ LOCATED AT THE MOLD PARTING LINE ANO 
COINCIDENT WITH THE BOTTOM OF THE LEAD GHERE LEAD EXITS PLASTIC BODY 


DATUMS AND (7 TO BE DETERMINED GHERE CENTER LEADS EXIT 
PLASTIC BODY AT OATUM PLANE EHX 


CONTROLLING OI[MENSION, INCH 


DIMENSIONS 01, 02, El AND E€2 ARE MEASURED AT THE MOLD PARTING LINE. 
D1 ANO €1 DO NOT INCLUOE AN ALLOWABLE MOLD PROTRUSION OF @.18 MM 
(,667 IN) PER SIOE. O02 AND E2 DO NOT INCLUDE A TOTAL ALLOWABLE 
MOLO PROTRUSION OF 0.18 M4 (.807 IN) AT MAXIMUM PACKAGE SIZE. 


PIN L [OENTIFIER 18 LOCATED UITHIN ONE OF THE TYVO ZONES INDICATED 


MEASURED AT OATUM PLANE €H 


MEASURED AT SEATING PLANE DATUM ETS 
240855-37 
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Package Thermal Specifications Ta = To — P* Oca. 


The 82750DB is specified for operation when Tc Typical values for @¢, at various airflows are given 


_ (the case temperature) is within the range of O°C to in. Table 6-3 for the 132-lead PQFP package. When 
95°. Tc may be measured in any environment to de- using the digital outputs, Table 6-4 shows the maxi- 
termine whether the 82750DB is within specified op- mum Ta, allowable (without exceeding Tc) at various 
erating range. The case temperature should be mea- airfllows. The power dissipation (P) is calculated by 
sured at the center of the top surface. | using the typical supply currents at 5V as shown in 

2.3 | Table 5-2. | 
Ta (the ambient temperature) can be calculated 
from @ca (thermal resistance from case to ambient) Similarly, when using the analog outputs, the maxi- 
with the following equation: mum T, allowed is a function of Ifs. The equation for 


calculating the power is given in the following 
- equation which can then be used in calculating the 
maximum Tp. 


P=5V * (Icont + (3 * lig + 8)) 


Table 6-3. Therman Resistances (°C/W) 


OCA Versus Airflow—ft/min (m/sec) 


| 200 400 600 800 1000 


Package 


132-Lead PQFP 


Table 6-4. Maximum Ta at Various Airflows (°C) 
Ta, Versus Airflow—ft/min (m/sec) _ 


Packaae Frequency 200° 400 600 800 1000 
2 a hae oe ee ” _(2.03) (3.04) (4.06) (5.07) 


132-Lead PQFP 
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PIXEL PROCESSOR 


m 25 MHz Clock with Single Cycle m Pixel interpolator — 

Execution = High Performance Memory Interface 
a Zero Branch Delay — 32-Bit Memory Data Bus 

, — 50 MBytes per Second Maximum 

m Wide Instruction ee Processor 25 MBytes per Second with Stan d ard 
512 x 48-Bit Instruction RAM VRAMs or DRAMs 
m 512 x 16-Bit Data RAM | m 16 General-Purpose Registers 
& Two Internal 16-Bit Buses 4 Gbyte Linear Address Space 
@ ALU with Dual-Add-With-Saturation m 132-Pin PQFP 

more Compatible with the 82750PA 


w Variable Length Sequence Decoder 


Intel’s 82750PB is a 25 MHz wide instruction processor that generates and manipulates pixels. When paired 
with its companion chip, the 82750DB, and used to implement a DV! Technology video subsystem, the 
82750PB provides real time (80 images/sec) pixel processing, real time video compression, interactive motion 
‘video playback and real time video effects. 


Real time pixel manipulations, including 30 images/sec video compression, are supported by the 25 MHz 
instruction rate. On-chip instruction RAM provides programmability for execution of a wide range of algorithms 
that support motion video decompression, text, and 2D and 3D graphics. Inner loops are optimized with the 
integration of sixteen 16-bit quad ported registers, on-chip DRAM, and two loop counters that provide zero 
delay two-way branching “free” in any instruction. Two, 16-bit internal buses enable two parallel register 
transfers on each 82750PB instruction, contributing to the real time performance of the video processing. 
Another feature that adds to the processing power of the 82750PB is the 16-bit ALU, which includes an 8-bit 
dual-add-with-saturate operation critical for pixel arithmetic. Other specialized features for pixel processing 
include a 2D pixel interpolator for image processing functions and a variable length sequence decoder for 
decoding compressed data. 


The 82750PB is implemented using Intel’s low-power CHMOS IV Technology and is packaged in a 132-lead 
space-saving, plastic quad flat pack (PQFP) package. 


Video Output 


\ VBUS[3:0] 


CSYNC 


| 


Video | D 
Mixer/ c | 82750DB 
Display | 


Device 


| > ALPHA[7:0] 


VRESET# 


0], Serial Shift 
| 4 Register 
: DATAIN[ 31:0 


HRESET# 


. Video 
Digitizer 


82750PB Subsystem Diagram 


Video Input 
240854-1 


Intel Corporation assumes no responsibility for the use of any circuitry other than circuitry embodied in an Intel product. No other circuit patent 
licenses are implied. Information contained herein supersedes previously published specifications on these devices from intel. February 1991 


© INTEL CORPORATION, 1991 Order Number: 240854-003 
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1.0 82750PB PIN DESCRIPTION | 


Pinout 


oo0co0qo0oq0q0c0q00c0c0qC0ce0c0cC0C00C0C0cC0C0C0CCceC0C0cC0c 000d 


vec 026. VCC 028 030 vSS VSS A31 aaa. KC A27 A25 A274, VSS)- KC 
O 1ss VSS 027. 029) ss«O31 CWC CLKOUT «ASO A228 OSS A26 WCC A23 


© vec 
© 021 
© 020 
© 019 
© 018 
© 017 
© 016 
© 015 
O vss 
© 014 
© 013 
© 012 


ee 82750PB Pinout 


O » | TOP VIEW 
Oss 

© 08 

© 07 

© 06 

© vss 

© 05 

© os 

© 93 

© vss 

© 02 

© 01 

O 0 | | | 
© vss | ) NXTFSTY A2 
© nn 


ROYg PMFRZQ 
© HAY nROME . | decors rrsug TESTS 
O vss TRNFRY Hraug / resetg = YS 
© vec HBUSEN# HREG# BE2g BEOY VSS VSS veUS(3}] VBUS{2:0] HALENY vss WC 

VSS WCC / vss | BE34| BEF | WoC | CLKIN| Weg] WCC HREOF { / WC ss 
oo obo obob000 oo0o od 090 


oanren8&wwnaerewm Be = 


CODCODDCDDDCDDDDNDNCCCOCCOCOC0C0O0CCCOCO 
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Figure 1-1. 82750PB Pinout 
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weit, | Location 
| A 71 


Table 1-1. Pin Cross Reference by Pin Name 


ney | Location net, | Location 
Pe lik : a 


BE3# D30 


A3 CLKIN 47 D31 

—A4 ~| CLKOUT HALEN# | 
DO. HALT# 
D1. -| HBUSEN #: 
D2 |HINT# — | 
D3 HRAM # 
D4 HRDY # 
D5 HREG# 
D6 HREQ# 
D7 MRDY # 
D8 MREQ# 
D9 NXTFST# | 
D10 PMFRZ# 
D11 RESET # 
D12 RFSH # 
D13 TEST # 
D14 TRNFR# 
D15 | VBUS[O] 
D16 ‘VBUSI1] 
D17 VBUS[2] 
D18 VBUS[3] 
D19 Vcc 
D20 Vcc 
D21 | Voc 
D22 Voc 
D23 Veco 
D24 Voc 
D25 Vcc 
D26 . Vcc 
D27 Voc 
D28 Vcc. 


: D29 Vcc 
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Table 1-2. Pin Cross Reference by Location 


[Location | ye 
34 


Name 
67 ; 


Vss | Vcc 
Vcc 68 Vss 
HBUSEN # A23 
TRNFR# A24 
HRDY # Voc 
Vss A25 
HREG# A26 
BES# A27 
BE2# “Vsg 
BE1# Voc 
BEO# A28 
Vcc A29 
Vss A30 
CLKIN A31 
Vss CLKOUT 
WE # Vss 
VBUS[3] Voc 
Voc Vss 
VBUS[2] D31 
VBUS[1] D30 
VBUS/(0] D29 
HALEN # D28 
HREQ# D27 
Vss Vcc 
HRAM# Vss 
MREQ# D26 
MRDY # D25 
NXTFST # Voc 
RFSH# D24 
RESET # D23 
Vss D22 
Voc Vss 


Vss Voc 


1-63 


intel. | 82750PB 


[31:9] 


| 7 | ADDRESS BUS ) ---- 
CLKIN | , 


RESET# MREQ# 


TRNFR#¥ VRAM 


LKOUT 
—— INTERFACE 


RFSH# 


82750PB BE#[3:0] t 
SHARED 


BYTE ENABLE 8US D BETWEEN 

| HOST AND 
HRAM# | 7 Wee Ansa 
: INTERFACE 


 HREQ# 


_ HREG# 


HALEN# | mae i 
dual DATABUS >------ = 
- HBUSEN# 


INTERFACE 


HINT# — . HALT# 


PMFRZ# MICROCODE 


SIGNALS 


VBUS[3:0] | | 
voP 
INTERFACE | VDP. COM. BUS J 5 | sae 


CONNECTIONS 
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Figure 1-2. 82750PB Functional Signal Groupings 


1-64 


Intel. 82750PB 


Quick Pin Reference 
Table 1-3 provides descriptions of 82750PB pins. 


Table 1-3. Pin Descriptions 


Symbol Name and Function 


CLKIN CLKIN is a 1X CLOCK INPUT that provides the fundamental timing for the 
82750PB. One cycle of CLKIN is denoted as one T-cycle. 


RESET # The 82750PB is reset and initialized by holding this signal active for at least ten 


Le | T-cycles. Refer to Initializing the 82750PB Section in Chapter 3. 
— iu 


The HOST REQUEST signal is a request from the host CPU to perform a read 
or write access to either registers on the 82750PB, an external device, or to 
HREG, # 
HRAM# 
HBUSEN # | 


VRAM shared by the 82750PB and the host. The type of access that is 
en 


requested is determined by the host access definition signals: HREG#, 
HRAM #, and WE #. 
The HOST REGISTER and HOST RAM signals, when validated by HREQ#, 
7 mo 
- B 
- an 


are used to define three host access cycles. HRAM# active indicates the host 
Wo 


is requesting a VRAM read or write cycle. HREG# active indicates that the 
host is requesting a 82750PB register read or register write cycle. When both 
signals are inactive, a host external cycle is requested. 


HOST BUS ENABLE is asserted by the 82750PB at the start of a host access 
to indicate that the 82750PB Address and Data buses (A[31:2], BE # [3:0], and 
D[31:0]) have been tri-stated. This allows the host to drive the same buses 

either for accessing shared VRAM or the 82750PB internal registers. 


The HOST ADDRESS LATCH ENABLE signal is used to indicate to the 
82750PB that the host has asserted a valid address (A[31:2], BE #[3:0]) and 
write enable (WE #). 


HOST READY is asserted by the 82750PB at the end of a host access to 
indicate that the access cycle is ready for data transfer. For a host write cycle, 
HRDY # indicates that the 82750PB is ready to accept data from the host. For 
a host VRAM write cycle, HRDY # indicates that the VRAM has latched the 
data from the host. For a host read cycle, HRDY # indicates that output data 
from the 82750PB or VRAM is ready to be latched by the host. 


HOST INTERRUPT: This output is asserted when an interrupt condition is 
detected by the 82750PB, and the enable bit in the PROCESSOR CONTROL 
register corresponding to that interrupt condition is set toa ONE. HINT # stays 
active until the host CPU reads the INTERRUPT STATUS register. If an 
interrupt condition that is enabled occurs during the same cycle that the 
INTERRUPT STATUS register is being read, HINT # remains active. 


The DATA BUS is used to transfer data between: 
1. The 82750PB and VRAM, and 

2. The Host CPU and internal 82750PB registers. During host VRAM accesses, 
this bus is tri-stated to allow the host to share the same VRAM data bus. During 
host accesses to internal 82750PB registers all 32 bits are used for data 
transfer. | 


The ADDRESS BUS is shared between the 82750PB and the host for 
addressing VRAM. This 30-pin bus addresses 32-bit double words in VRAM. 
Byte Enable signals are used to address individual bytes or words within a 
double word in VRAM. In addition, the address for host accesses to internal - 
82750PB registers are communicated to the 82750PB using the lower seven 
pins, A[8:2], and the BE# pins. During host access cycles to either VRAM or 
82750PB internal registers, A[31:2] are tri-stated. For internal register | 
accesses, as indicated by HREG # being low, the lower seven bits, A[8:2], are 
used as the host address input. 


The CLOCK OUTPUT signal is one of the two internal clocks and is 
synchronized with CLKIN. It is always driven and will have a 50% duty cycle. 


all He 
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Table 1-3. Pin = eee (Continued) 


Fo NameandFunction  — sis and Function 


BE# )— Semel 0) I/O The BYTE ENABLE BUS is shared by the 82750PB and the host for 
addressing VRAM down to the byte level. The correspondence between | 

the four Byte Enable pins and the D[31:0] pins is: BE # [3] -D[31:24], 
BE # [2]-D[23:16], BE#[1]-D[15:8], and BE #[0]-DI[7:0]. During VRAM 
_ read cycles, the 82750PB enables all four bytes. During write cycles the » 


82750PB only enables those bytes that are to be written. Bytes that are 


not enabled are not to be altered in VRAM. During host accesses to 
82750PB on-chip registers, the BE # [0] pin is used as an input to select | 
whether the even or odd word is being accessed; the double word 

~ address is provided by the host on the A[8:2] pins. BE # [0] = 0 indicates 
that data is transferred on D[15:0]. BE#[0] = 1 indicates that data is 
transferred on D[31:16]. 


MREQ# — [ee MEMORY REQUEST is asserted for the first cycle, T1, of each VRAM 
cycle. 


The MEMORY CYCLE DEFINITION SIGNALS: Transfer, Refresh and 
Write Enable are asserted at the same time as MREQ #, but stay active 
for the entire VRAM cycle. TRNFR# active indicates a VRAM transfer 
cycle. RFSH# active indicates a VRAM refresh cycle. If neither TRNFR# 
nor RFSH# are active, a VRAM data read or write cycle is requested. 


The WRITE ENABLE pin is used as an output during a 82750PB/VRAM 
cycle to drive the WE # signal, which defines the access as a VRAM read 
cycle (when inactive) or write cycle (when active). During Host/ VRAM 
and Host External cycles, the 82750PB tri-states this pin to allow the host 
to drive the VRAM write enable signals directly. During Host/register 
cycles, this pin is used as an input for the Host Write Enable signal to 
determine whether the host is reading or writing the 82750PB register. 


The NEXT FAST signal indicates that the following vram cycle can be 
performed with a page-mode or bank-interleaved access. This signal is 
asserted during the first of a pair of VRAM cycles that is guaranteed to be 
within the same VRAM page and in opposite banks—a Pair of accesses. 
to two sequential double words in VRAM at addresses Even Address and 
Even Address + 1. In other words, all isa zero for the first cycle and a 


- one for the second cycle. 


The MEMORY READY input indicates that the VRAM cycle has 
progressed to the point where it is ready to perform the data transfer. For 
a VRAM read cycle, the VRAM data can be latched by the transition of | 
MRDY # to an active state. For a VRAM write cycle, MRDY # indicates 
that the data has been latched into the VRAMs. 


The VDP COMMUNICATION BUS is used to communicate from the 
82750DB to the 82750PB. Codes sent over this bus indicate interrupt 
requests, transfer requests, and status information. Since the 82750DB 
and 82750PB run asynchronously, the VBUS signals are sampled on the 
falling edge of CLKIN and compared with the previous sample. For a 
VBUS code to be detected by the 82750PB, it must be valid for two _ 
successive samples. 7 


The HALT signal causes the microcode processor on the 82750PB to 

. halt prior to executing the next instruction. This signal does not halt the 
VRAM interface. The Halt signal will allow the design of a hardware 

emulator for the 82750PB based on an 82750PB chip. 


The TEST signal is used for test purposes only and must remain high for 


TRNFR#, | 
RFSH# 
WE # | 


I/O 


NXTFST # 


= fa) 


VBUS[3:0] 


HALT # 


normal operation. 
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Table 1-3. Pin Descriptions (Continued) 
Name and Function 


The PERFORMANCE MONITORING AND FREEZE signal is toggled by 
specific microcode instructions and can be used to determine the time 
required to execute certain sections of microcode. | 


POWER pins provide the + 5V D.C. supply input. 
GROUND pins provide the OV connection to which all inputs and outputs 
are referenced. . | 


Table 1-4. Output Pins Table 1-5. Input Pins - 


Symbol 


PMFRZ# 


< 
'?) 


C 
Vss 


Synchronous/ | 
Asynchronous 


VBUSI3:0] 


Low 


Lo *Can be programmed to accept synchronous inputs. 
*The reset state is caused by RESET # being active low. 


Table 1-6. Input/Output Pins 


Aiea] | High + Reset", Host Cycle | Synctronous | 
BE # [38:0] Reset*, Host Cycle 
Wee 


D[31:0] 


*The reset state is caused by RESET # being active low. 


. 


All output pins are floated wnen RESET is active low. 
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ifitei. 
2.0 ARCHITECTURE 


Overview | 


The 82750PB includes a wide instruction word 
processor that comprises a number of processing, 
storage, and input/output elements. The wide in- 
struction word architecture allows a number of these 
elements to operate in parallel. The 82750PB exe- 
cutes one instruction every internal clock cycle or 
T-cycle. The various elements are connected via 
two 16-bit buses, the A bus and B bus, as shown in 
_ Figure 2-1. During each instruction execution cycle, 
- data can be transferred from a bus source to a bus 
destination element on both buses. ; 


Registers 
| (rN: N = 0-15] 


There are 16 general-purpose data registers, each | 


16 bits wide, that are connected to both the A bus 


and B bus:as both sources and destinations. These 
registers are designated ,0-r75. All the registers are 


SEQUENCER 


MICROCODE 
RAM 


f| MICROCODE 
Be INSTRUCTION | 


REGISTER 
FILE. 


BARREL 
SHIFTER 


INTERPOLATOR 


82750PB 


functionally identical except 70, which also includes 
logic for bit shifting and byte swapping. A register 
can source both the A bus and the B bus in the 
same cycle. A register cannot be the destination of © 
both the A bus and the B bus in a single instruction. 


‘ Because the registers are doubly latched, the same 
_ register may be both a source and destination in the 
same cycle. The result is that the data in the register | 

_ prior to the current cycle will be driven on the source 


bus, and the data on the destination bus will be 
latched into the register at the end of the cycle. 


Register rO has additional logic to allow bit shifting 
and byte swapping. The value. in.r0 can be shifted 


left or right one bit position per instruction cycle. For | 


a right shift, the new MSB is equal to the old MSB; in 


_ other words, the value is sign-extended. For left 


shifting, the new LSB is equal to zero. AO can not be 
shifted and loaded in the same instruction. Byte 
swapping, on the other hand, only occurs when 70 is 
being loaded with a value from the A bus or B bus. 
Byte swapping causes the most significant byte and | 
the least significant byte of the 16-bit value being 
loaded into 70 to be interchanged. Refer to Chapter 
4 for a description of the SHFT microcode field that 
controls the shifting and swapping operations in 70. 


INPUT/ 
OUTPUT 
FIFOs 


STATISTICAL 
DECODER 


A[31:2] 


BE ¥[3:0 
POINTERS _—(Rameteananag> 


HOST/VRAM 
INTERFACE 


' 240854-4 


Figure 2-1. 82750PB Block Diagram 
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{alu, cc} 


The ALU performs 16-bit arithmetic and logic opera- 
tions, and can also be operated as two independent 
8-bit ALUs for the Dual-Add-with-Saturate operation. 
There are two fields in the microcode instruction that 
affect the operation of the ALU: the ALUOP field 
specifies the operation to be performed, and the 
ALUSS field specifies the source of the two ALU 
inputs. Refer to Chapter 4 for further information on 
these. fields. 


The two ALU operands either come from values 
held in the ALU input latches or from “eavesdrop- 
ping” on the A or B buses. The result of any ALU 
operation is latched in the ALU output register, a/u. 
In a subsequent instruction this result can be trans- 
ferred to any A or B destination. = 


The ALU has four condition flag outputs: CarryOut, 
Sign, Overflow, and Zero. CarryOut is the carry out 
of the most significant bit position. Sign is equal to 
the value of the most significant bit of the result. 
Overflow is the exclusive-OR of CarryOut and the 
Carryln to the most significant bit position of the re- 
sult. Zero is true (a value of -1) if all 16 bits of the 
ALU result are equal to zero. CarryOut and Overflow 
are defined as equal to zero for all logical opera- 
_ tions. For most ALU operations, the state of these 
four condition flags are latched when the operation 
is complete. There are eight operations (nop, a’, b’, 
+], -], O*, prof and int) that are exceptions. These 
operations are performed without disturbing the 
condition state of the previous ALU operation. 


Microcode routines can read and write the ALU con- 
dition flag register, cc. This can be used to save and 
restore the state of these flags. The bit ordering of 
the ALU condition flags within cc are given in Table 
2-1.A complete list of ALU opcodes is given in Table 
2-2. 


Table 2-1. Bit Assignments for cc Register 


Condition 


False (This bit of the cc is always read as 
a zero.)* ; 


ALU Carry Out 

ALU Overflow | 
ALU Sign 

ALU Zero 

Loop Counter Zero* 
RO LSB* 

RO MSB* | 


RESERVED. The state of these bits is 
undefined when read; write as zeros. 


*These are read-only values and are not affected by writes to the cc 
register. . 


Bit 0 


Bit1 — 
Bit 2 
Bit 3 
Bit 4 
Bit 5 
Bit 6 
Bit 7 
Bit 15:8 
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Table 2-2. ALU Opcodes 


No Operation 


passa 


A 


ae ae 
a ee 
ae 
| 2'scomplimentofa | a 
Sue 
DONO a 
OR NOTE) ae he 
| Dual Sub.withSat | 


The Dual-Add-with-Saturate operation performs in- 
dependent 8-bit ADDs on the upper and lower bytes 


+ 
+ 
b+ + 

-+- 
ie 
= 


pees Don 
eS 
pee O ee at 
ae 
ae 


-of the two ALU operands. The two bytes of the A 
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operand are treated as unsigned binary numbers 
(00:FF46 corresponds to 0:25540). The two bytes of | 
the B operand are treated as offset binary numbers 


no, ff 


ital. 


with an offset of +128 (00:FF,¢ corresponds to 
—12840:12740). The upper and lower byte results 
are treated as 9-bit offset binary, including the carry 
output of each byte, with a + 128 offset (000:1FF46¢ 
corresponds to — 12849:38349) and are saturated to 
a range of 0O-2554o. A result that is less than zero is 
set equal to zero or 0016 and a result that is greater 
than + 255 is set equal to + 255 or FF 46. 


In fact, this operation is symmetric. Either the A op- 


erand or the B operand can be defined as the un-— 
signed binary value, and the other operand will be 


treated as the offset signed binary value. 


Dual-subtract-with-saturate is similar to dual-add- 
with-saturate. It calculates A — B + 128 on each 


8-bit half of the two 16-bit inputs, and clamps the’ 


results to 0 and 255. This can be viewed as subtract- 
ing an offset-binary signed byte (— 128 to 127) from 
an unsigned byte (0 to 255). 


The ALU opcode ‘int’ generates the MCINT (micro- 


code interrupt) condition. When this condition is de- 
tected by interrupt logic in the host CPU interface, 
and if the Enable MCINT bit in the PROCESSOR 
CONTROL register is set to a ONE, the host inter- 
_ rupt output, HINT#, will be asserted. Refer to Chap- 
ter 3 for further information on host interface. 


The ‘prof’ opcode activates the PMFRZ#¥ pin, and is 
primarily used for performance monitoring and/or 
debugging. 


Barrel Shifter | 
(shift, shifter, shifterl, shift-l 


The barrel shifter performs a single cycle, n-bit left or 
right shift. The barrel shifter operates independent of 
the ALU. The three barrel shifter operations are: 
Shift-r for a right shift with sign extend; Shift-r/ for 
right shift with zero fill; and Sh/ft-/ for a left shift with 
zero fill. The shift operation is invoked by writing a 
4-bit value (the shift amount) to one of three A bus 


registers, depending on which of the three opera-. 


tions is to be performed. The operand is taken from 
the B bus, and the result is stored in the barrel shift- 
er output register, S//ft. Like the ALU result register, 
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Data RAM 


{dramN, *tdramN, ++, --;N= 1-4} 


The Data RAM holds 512, 16-bit words that are ac- 
cessed using four pointers. To access a value in a 
particular location, the microcode routine must first 
load a pointer with the address to be accessed, and 
then perform a read or write using the same pointer. 
In parallel with the data RAM access, the pointer 
can optionally be post-incremented or post-decre- 
mented. The four pointers, referred to as dram7— 
dram4, can be written and read via the A bus. When 
a dram pointer, which is only 9 bits wide, is read onto 
the A bus, its upper seven bits are set to zeros. — 


NOTE: 


The width of the dram pointers may change in 
later versions of the 82750PB. Software should 


not rely on the width of a pointer to, for exam- | 
ple, mask the upper seven bits of a value to 
zero. 


All four pointers can be used to read or write the 


Data RAM from either the A or B bus. Only one Data 
RAM access can be performed in any cycle. A Data 
RAM access is referred to, using C language syntax, 
as *dram7. The * means “the value pointed to by’. 
As another example, *dram3+ + means access the. 
Data RAM using the pointer aram3 and increment 
dram3. The symbol — — in Beg of the ++ would 
indicate autodecrement. 


Loop Counters» 
_ | | {entent2} 


Two 16-bit loop counters are available to microcode 
programs for automatically counting iterations of a 
microcode loop. In parallel with other operations 
performed in an instruction, either loop counter can 


_be decremented, and a conditional branch can be. 


the value in Shift can be read onto the A bus or B 


bus in the following instruction cycle. 


A barrel shifter operation does not affect any of the 
condition flags. 
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made based on the loop counter value being equal 
or not equal to zero. Since the two loop counters 
can be written and read on the A bus, as cnt and 
cnt2 respectively, they can also be used for variable 
storage when not being used as loop counters. The 
loop counters can be written to and decremented 
during the same instruction cycle. The value in the 
counter at the start of the next cycle will be ihe value - 
written to the counter minus one. 


The LC microcode bit datannings the loop counter 

that is selected for decremeniing and/or branching 
in an instruction. The LC microcode bit does not af- 
fect the loop counter that is written or read over the 
A bus, since each loop counter is separately ad- 
dressable as a A bus source or destination. Refer to 
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Chapter 4 for a description of the CNT — — micro- 
code bit that causes the select loop counter to be 
decremented, and for a description of the CFSEL 
microcode field that is used to perform a conditional 
branch based on the selected loop counter’s value. 


Microcode RAW 


{mcode1-3, maddr, pc} 


The 82750PB executes instructions stored in an on- 
chip microcode RAM. This RAM holds 512 instruc- 
tions and each instruction is 48 bits wide. Normally, 
to start the microcode processor, the host CPU will 
load a microcode program into the microcode RAM, 
point the program counter, pc, to the start of the 
program, and then release the HALT bit to start exe- 
cuting the microcode program. The microcode proc- 
essor can also load its own microcode RAM to over- 
lay new routines and therefore, does not require 
constant intervention by the host to perform multiple 
operations. 


Writing an instruction into Microcode RAM is done 
by first loading the three registers mcode3, mcode2, 
and mcode7 with the three 16-bit words of the in- 
struction (the most significant word goes into 
mcode7 ), and then loading the address where the 
instruction should be written into madar. 


The host CPU can also read the Microcode RAM by 
first loading the pc with the address of the instruc- 


Example 1: 
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tion to be read and then reading the three 16-bit 
words of the instruction from the mcode1—mcode3 
registers. Normally, this would be done by the Host 
CPU while the 82750PB is halted. Since mcode7- 
mcode3 hold the instruction pointed to by the pc (i.e. 
the instruction that is about to be executed), normal- 
ly reading these three registers from a microcode 
routine is not useful. 


The read registers named mcodei-mcodes3 and the 
write registers also named mcode1—mcode3 are in 
fact different registers. Writing values into mcode1- 
mcodes3 and then reading the values of mcode1- 
mcode3 will not read back the same values just writ: 
ten. The read registers hold the instruction stored ir 
the instruction latch (the instruction to be executed). 
The write registers hold an instruction that is about 
to be written into microcode RAM. _ 


After writing to madar to load an instruction into mi- 
crocode RAM, a one cycle freeze occurs and during 
the freeze a write to the microcode RAM takes 
place. The instruction following the write to madar 
can either jump to the address just loaded or start 
loading the mcode1—mcode3 registers with the next 
instruction to be written. 


Here are two examples that illustrate the fact that 
the 82750PB requires at least one instruction be- 
tween the write to maddr and the execution of the 
instruction that is loaded by the write to maddr. 


* load instruction */ 
jump to it, this is the extra inst. required between */ 
writing to maddr and executing the loaded inst. */ 


maddr = ADDR1 
jmp addrl 


ADDR1: 


LPPPLP LLP PP here's where new instruction gets loaded */ 


Example 2: 


maddr = INST 
nop /* extra instruction */ 
INST: 
/* instruction gets loaded here */ 


When a microcode routine writes to pc, one more instruction is executed before the jump to the new address 
takes effect. For example: | 


pe = ADDR1l . 
rO = rl. jmp ADDR2 /* this instruction gets executed but */ 


/* its jump to ADDR2 is ignored. */ 


/* after this instruction executes r3 = 


r0 = rl */ 
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When the host CPU writes to the pc, the instruction » 
at the address that was written is loaded into the . 


mcode1—mcode3 registers and, when the micro- 
code processor is released from its Halt condition, 
this is the first instruction that will be executed. | 


When the host CPU reads the pc, the result returned 
is the address of the instruction that will be executed 
when Halt is released, that is, the address of the 
instruction held in the mcode?-—mcode3 registers. 


Horizontal Line Counter 


{ lent} 


The 12-bit Horizontal Line Counter is updated by 


VBUS codes from the 82750DB to track the horizon- 
tal display line that is currently being scanned by the 
82750DB. The counter is reset by a VODD code and 
incremented each time an HLINE code is received. 
_ A value can also be written into a Horizontal Line 
Counter but this is used primarily for testing the 
82750PB. The upper four bits will always read zeros. 


Field Counter 
| { font} 


The 4-bit field counter is updated by VBUS codes 


from the display processor to keep track of the field 


count being displayed by 82750DB. The counter is 
incremented each time a Vopp code or Veven code 
is received. When reading the field counter, the up- 
per 12 bits will read zeros. This counter wiil not be 
initialized upon reset. | 


Input FIFOs 
~ {inN-lo, inN-hi, inN-c, *inN: N = 1, 2} 


There are two input channels, referred to as input 
FIFOs, through which the processor can read pixels 


or data from VRAM. Each channel automatically: 


- fetches 64-bit quad words from VRAM and breaks 
them into 8-bit bytes or 16-bit words that are read by 
microcode. Each input FIFO operates independently 
and can be programmed to automatically increment 
or decrement through bytes or words in VRAM. The 
FIFOs are double buffered so that while values are 
being extracted from one quad word (64 bits), the 
next quad word is being prefetched from VRAM. 
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The mode control register for each input FIFO, des- 
ignated in?-c or in2-c, contains four mode bits as 
seen in Figure 2-2. The WORD/BYTE bit (bit 0) de- 
termines whether the input FIFO is in word mode 


_ (WORD/BYTE = 0) or byte mode (WORD/BYTE = 


1). In byte mode, the FIFO can start reading on any 
byte boundary and in word mode on any word 
boundary. | 


The INC/DEC bit (bit 1) determines the order that 
bytes or words are read from VRAM. In INCRE- 
MENT mode, with INC/DEC = 0, the FIFO reads 
from the least significant byte or word to the most 
significant byte or word of each double word and 
increments through double words in VRAM. In DEC- 
REMENT mode, with INC/DEC = 1, the FIFO reads 
from most significant byte or word to least significant 
byte or word within a double word and Becrements 
ietan, double words in VRAM. 


‘The AHOLD bit (Bit 2) is used by the address hold 


mode. When asserted, (bit 2 = 1) the automatic ad- 
dress increment/decrement function will be disabled 
and input FIFOs will not double buffer VRAM data. In 
other words, at the end of a VRAM cycle, when the 
FIFO has been updated with 64 bits of VRAM data, 
the input FIFO will not issue another MREQ# until ' 
there is a write to the address-lo registers OR a roll- | 
over/roll-under read access of the input FIFO. If a 
roll-over/roll-under occurs, then a memory request 
will be issued to fetch data from the same VRAM 
location. If there is a write to the address-lo register, 
the FIFO will then fetch data from the new location. 


The PREFETCH OFF bit (bit 3) specifies whether 
the FIFO will automatically prefetch successive quad 
words from VRAM or will only fetch a new quad word 
when a value from that quad word is requested. In 
PREFETCH-ON mode, bit 3 = 0, the input FIFO pre- 
fetches successive quad words from VRAM as néc- . 
essary to keep its buffer full (either from ascending 
or descending addresses, depending on the state of 
the INC/DEC. bit). In PREFETCH-OFF mode, the 
FIFO will still prefetch the first two quad words to fill 
its buffer (when started at a new address location), — 
but will only fetch a new quad word when a read 
request is made to the FIFO for a value in the next 
unfetched quad word. 


The CB bit (bit 4) allows circular buffers of sizes 

64 Kbytes, 128 Kbytes, or 256 Kbytes to be created 
in VRAM memory. The choice of different sizes of 
buffers are determined by programming the least 
signficant 3 bits of the circular buffer register (cir- 


1 | 0 


INC/DEC WORD/BYTE 
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cbuf). To enable this feature, the CB bit has to be 
set to a 1, then depending on the buffer size 
selected, the appropriate address pin that goes off 
chip will be forced to a O (register pointers remain 
unchanged). Table 2-3 shows the programming 
combinations of the circular buffer register. 


It is important to note that the internal address 
counters themselves are not affected by the circbuf 
function. Only the selected external address pin is 
forced to ‘0’. 


Table 2-3. Circular Buffer Register (circbuf) 


(If Function Enabled) 


In “BY-32” MODE (bit 3), the pointer increments or 
decrements by 32 bits, independent of whether the 
FIFO is in 8-bit pixel mode or 16-bit pixel mode. This 
mode was added to facilitate microcode that oper- 
ates on one component of a 32-bit per pixel image. 


The standard sequence for initializing an input FIFO 
is to write to the control register (/n-c), the high ad- 
dress (in-hi), and then the low address (in-/o) of the 
appropriate FIFO. Refer to the access state diagram 
in Chapter 3. The write to in-/o causes the FIFO to 
start reading from VRAM. A byte or word is then 
read from *in. Successive reads from */n will read 
sequential bytes or words from VRAM. Writing to the 
control register each time the FIFO is started at a 
new address is not necessary, except to change the 
FIFO’s mode. Also, if the new address is within the 
same 64 kByte page of VRAM, only the lo-address 
needs to be written in order to start the FIFO reading 
from the new address. 


If microcode attempts to read a value from an empty 


input FIFO, the processor is frozen prior to the exe-' 
cution of the instruction, until the FIFO’s control log- . 


ic has fetched another double word from VRAM and 
extracted the next value. At this point, the processor 
is released from the frozen state, and the instruction 
that reads the value is executed. When the proces- 
sor is frozen waiting for a particular FIFO that isn’t 
yet ready, that FIFO’s VRAM access priority is raised 
above all other FIFOs. 


bis: 15-6 5 4 


64 Kbytes | Address Pin 16 Forced to 0 
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Output FIFOs 


{outN-lo, outN-hi, outN-c, *outN, outN+ +; N = 7, 2} 


There are two output channels, referred to as output 
FIFOs, through which the graphics processor writes 
pixels or data to VRAM. Each channel automatically 
collects bytes or words into 64-bit quad words and 
writes the quad words to VRAM. Each output FIFO 
operates independently and can be programmed to 
write bytes or words into sequential addresses in 
VRAM (either incrementing or decrementing). The 
FIFOs are double buffered so that while one quad 
word is waiting to be written to VRAM, the next quad 
word can be assembled from individual bytes or 
words. 


The mode control register for each output FIFO, 
designated out1-c or out2-c, contains six mode bits 
as shown in the Figure 2-3. The WORD/BYTE bit 
(bit 0) determines whether the output FIFO is in word 
mode (WORD/BYTE = 0) or byte mode (WORD/ 
BYTE = 1). In byte mode the FIFO can start writing 
on any byte boundary in VRAM and in word mode on 
any word boundary. | | 


The INC/DEC bit (bit 1) determines the order that 
bytes or words are written to VRAM. In INCREMENT 
mode, with INC/DEC = 0, the FIFO writes from the 
least significant byte or word to the most significant 
byte or word in a double word and increments 
through double words in VRAM. In DECREMENT 
mode, with INC/DEC = 1, the FIFO writes from 
most significant byte or word to least significant byte 
or word within a double word and decrements . 


through double words in VRAM. 


When the AHOLD bit (bit 2) is set, the output FIFO 


_ quad word address is not incremented or decre- 


mented. In this mode, the FIFO continues to output 
to a single quad word in VRAM. 


The FORCE-LSB bits (bits 3 and 4) are used to force 
the least significant bit of each byte written to VRAM 
to either a zero or a one. This can be used, for ex- 
ample, to force the LSB to the correct polarity when 
writing to the U bitmap during motion video decom- 
pression. In certain display modes for the 82750DB, 
the LSB of the 8-bit samples in the U or Y bitmap are 
used to select VIDEO or GRAPHICS display mode 
for the n x n group of display pixels corresponding to 
the particular U or Y sample. A one in the FORCE- - 


3 Q | 4 0 


Set to Zeros BY-32 MODE FORCE-LSB FORCE-LSB AHOLD INC/DEC WORD/BYTE 


ENABLE 


VALUE 


Figure 2-3. Output FIFO Control Register 
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LSB ENABLE bit (bit 4) enables the forcing; a zero 
results in normal operation. The FORCE-LSB VAL- 
UE bit (bit 3) is used as the value to which the LSB is 
forced. Whether in byte mode or word mode, the 


LSB of each byte is forced to the FORCE-LSB value. — 


In “BY-32” MODE (bit 5), the pointer increments or 
decrements by 32 bits, independent of whether the 
FIFO is in 8-bit pixel mode or 16-bit pixel mode. This 
mode is used to facilitate microcode that operates 
on one component of a 32-bit per pixel image. The 
bytes or words that are skipped over will ne un- 
changed in VRAM. 


The standard sequence for initializing an output 
FIFO is to write to the control register (out-c), the 
low address (out-/o), and then the high address (out- 
hi) of the appropriate FIFO. A series of bytes or 
words is then written to *out. Refer to the access 
state diagram in Chapter 3 (Figure 3-1). 


In order to flush any remaining data in an output 
FIFO before changing its VRAM pointer, it is neces- 
sary to write to the control register. When pointing to 
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Statistical Decoder | 
{ stat-lo, stat-hi, stat-c, stat-ram, *stat, *stat# } 


The Statistical Decoder (also referred to as the Huff- 
man Decoder) is a specialized input channel that 
can read a variable-length bit sequence from VRAM 
and convert it into a fixed-length bit sequence that is 
read by the microcode processor. In image com- 
pression, as well as in other applications such as 
text compression,' certain values occur more fre-. 
quently than others. A means of compressing this 
data is to use fewer bits to encode more frequently 
occurring values and more bits to encode less fre- 


_ quently occurring values. This type of encoding re- 


a new location in VRAM, if the new address is within © 


the same 64 kByte page of VRAM, ony the lo-ad- 
dress needs to se written. 


There must be one instruction between the write to 
the output FIFOs low address and the first write to 
*outN. Therefore, it is recommended that outN-lo be 
written before outN-hi. The write to outN-hi insures 
that this requirement is met. If only the outN-lo value 
is being changed, it is still necessary to have one 
additional instruction before the first write to *outN. 


When writing bytes or words to VRAM through an 
output FIFO, a byte or word can be skipped over by 
writing to outN+ + instead of *outN. When the val- 
ues are written to VRAM, any byte or word that was 
skipped will retain its original value in VRAM, and its 
value is not altered by the VRAM write. This can be 
used when writing a: series of pixels, some of which 
are “transparent”, allowing whatever was behind 
them to show earoudh 


If the mieroceds routine attempts to write a value to 


a full output FIFO, the processor is frozen prior to 


the execution of the instruction. The processor re- 
mains frozen until the FIFO has a chance to write 
one of the buffered quad words to VRAM. At that 
point, the processor is released from the frozen 
state, and the instruction that writes the value is exe- 
cuted. When the processor is frozen, waiting for a 


particular FIFO that isn’t yet ready, that FIFO’s — 


VRAM access Paonty is. gISee above all other 
FIFOs. , 
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sults in a variable-length sequence in which the ~ 
length of a symbol (the group of bits used to encode 
a single value) can range for example, from one bit 
to sixteen bits. 


The statistical code that the statistical decoder can 
decode is of either of the two forms: 


Ox 1x 

10x 01x 

110xxx Ries 

111 0xxxxx cheese cech 
a 0) 
11111110xxxxxx 00000001 xxxxxx 


1111111 10x00 000000001 xxxxxx 


Each symbol of a given length (one per line as 
shown here) consists of a run-in sequence followed 
by some number of x-bits. The run-in sequence is 
defined as a series of zero or more ONEs followed 


~ by a ZERO or, as in the code on the right. above, 


zero or more ZEROs followed by a ONE. The re- 
mainder of this description will use examples of the 
code on the left. A bit in the decoder’s control regis- 
ter determines the polarity of the run-in sequence 
bits. | - 


In the example on the left, ae ious be two sym- 


~ bols of length two: 00 and 01. Each x-bit can take on 


a ZERO or ONE value. The number of x-bits follow- 
ing a run-in sequence can range from zero to six. 
Since the goal, in general, is to have a few short 
codes and a.larger number of long codes, typically, 
codes with fewer run-in bits will have fewer x’s fol- 
lowing. However, this is not a hardware constraint. A 
code of this form is completely described by a code 
description table indicating: for each length of run-in 
sequence, R = the number of ONEs in the run-in, 
and how many x-bits follow the ZERO. The value of 
R is used as an index into the code description table. 
Due to the hardware implementation, the number 
actually stored in the table is = where x is the num- 
ber of x-bits. 


For the example above, the saneseeaing code de- 
scription values are given in Table 2-4. 
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Table 2-4. Sample Code Description Table 


PR | xX | adeoy | 2x(bin.) 


000 0010 
000 0010 
000 1000 — 
010 0000 


100 0000 


Note that the table only goes up to symbols with 
seven ONEs in the run-in. For symbols with more — 


than seven ONEs, the value of X and 2% for seven 
ONEs is used for all symbols having seven or more 
ONEs in the run-in sequence. For example, in the 
code above a symbol with eight or more ONEs in the 
run-in sequence has six x-bits following the ZERO, 
which is the same as symbols having seven ONEs. 


For each different symbol, including all symbols of 


the same run-in length with different x-bit values, the: 


decoder generates a unique fixed-length, 16-bit val- 
ue. Some of the decoded values for the sample 
code given above are provided in Table 2- 5. 


Table 2- 9. Aécoded Values 


aa eee CON 

a eres 1 

ek ae Coan 

Sana SOR 
es 
ae eee 
a a 


111011111 43 


*The x-bits of the symbol are in boldface for clarity. 


The algorithm for generating a decoded value from a 
symbol is as follows: all symbols of a given run-in 
length are assigned a base value, B; the value corre- 
sponding to a particular symbol is equal to B plus the 
binary value of the x-bits in the symbol. The base 
valule B for a symbol with a run-in length of R is 
calculated by: 


B(R) = SUM[2X(0] with r = 0 toR — 1, 
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Symbol* Decoded Value 


82750PB 


where X(r) corresponds to the X value in the table 
entry corresponding to R = 


For example, in the above code: 


B(O) = 0, B (0) is sa zero 
B(1) =0+2=2 
B(2)=0+2+2=4 

B(3) =0+2+2+8= 12 
B(44) =0+2+2+ 84 32 = 44 


This is one of the reasons that the table holds 2% 
instead of X. The calculation of B(R) are easier to 
implement in logic. 


There are two enhancements that are made to this 

coding scheme in the implementation on the 
82750PB. These two modes are referred to as END 
mode and SHORT mode. If neither END nor SHORT 
mode are enabled, the decoding is performed as de- 
scribed above. SHORT mode allows the decoder to 


be switched easily to a simpler code format without 


having to reload the code description table. In the 
SHORT form, all symbols have the same number of 
x-bits, as though all entries in the table had been 
filled with the same value of 2X. When SHORT mode 
is invoked, this value of 2% is obtained from a field in 
the statistical decoder’s CONTROL word, instead of 
from the individual table entries. 


END mode is added in recognition of the fact that, 
for codes with few symbols, some increase in effi- 
ciency is possible by not having to place a zero at 
the end of the longest run-in sequence. For exam- 
ple, consider. the code: 


om 
10x 
110x 


The END mode allows us to shorten the last symbol 
to 11x instead of 110x. The trailing ZERO is not re- 
quired because the decoder has been told that the 
maximum length of a run-in is two ONEs. The result- 
ing symbol set and corresponding decoded values 
are given in Table 2-6. 


_ Table 2-6. END Mode Decoded Values 


Symbol Decoded Value 
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The number of x-bits must be constant for all sym- 
bols of the same run-in length. Therefore, a code 
such as: 


0 
10xx 


11xxx <— NOT CORRECT! ... Must be 11xx. 


is not allowed. The last symbol (11xxx, in this case) | 


uses the same table entry for 2X as the next to last 
symbol (10xx) and, the 
11xx. | 3 


The maximum length of the run-in sequence in END 


mode is specified by placing an END flag in the code 


refore, the last symbol will. be 


description table. For example, a code and the cor- 


responding table is shown in Table 2-7. 


Table 2-7. END Flag Decoded Values" 


- Table Entries | 


index [END Bit 
es 


110xxx 


The hyphens indicate that those table entries aren't 
used to decode this code. Note that the symbol 
111xxx has three x-bits because of the value of 2X in 
Index 2; it is not based on the 2% value in Index 3. » 


The SHORTED and END modes can be invoked 
simultaneously, resulting in a code such as: 


Ox 

10x 
110x . 
141x 


— 


with a SHORT—2X value 
symbol) and the END bit set-in Index'2. 
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' Table 2-8. Packed 3-Bit 
Field Decoded Values 


‘Table Entries" 


The unpacked bits are in reverse order relative to 
how they are stored in VRAM. For example, if three- 
bit values are packed in VRAM, the pattern 110:in 
VRAM Js read from. right to left and gives an un- 
packed or decoded value of 3: 7 ee 


The CONTROL register for the statistical decoder 
(stat-c) is used to specify the mode to use for decod- 
ing, aS well as to invoke certain modes for writing 
and reading the code description table. Refer to the 
bit assignments for this register below. To write to | 
the code description. table, the WRITE bit (bit 4) is” 
set to a ONE; the starting table index is reset to. 
zero. Each write to the table causes the index to 
increment by one. This index will wrap around from > 
seven back to zero. For example, to write all eight 
table entries the user would write a value of 0x10 to 
stat-c register and then write eight 8-bit values to the 
register stat-ram. The most significant bit of each 
8-bit value is the END bit, and the lower seven bits 
are the values of 2X. To read the code description 
table, the TEST bit (bit 5) of the CONTROL register 
is set to a one. The table entries are then read from 


the decoder’s data register (*stat). Reads and writes 


always start at table entry zero. 


2 (for 1, x-bit in each 


Packed binary fields with one to seven bits per field 
can be read using the statistical decoder by setting 
the END bit in Index 0 and by programming the X 
value to be N—1, where N is the number of bits per 


thraanlhit fialde aariileA 


Wit w 


decoded as shown in Table 2-8. 


. KA 
wuVeo*vit MOMO UVUIU Wo 
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NOTE: 


When reading the code description table, it is 
necessary to wait-one instruction time between 
the write to stat-c and the first read from “stat. 
An access diagram showing all legal sequences 
for read and write FIFO registers is shown in 
Chapter 3 (Figure 3-1), jr aie 


intel. 
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The-code for reading the eight table entries into the first eight locations of data RAM would be: 


dram3 = 0 
cnt = 8 
LOOP: 


stat-c =.0x20 | 


/test mode to read the stat-ram (the table) 
/wait one inst. before first read | 


*dram3+ + = *statcnt— — | 
/two inst. loop necessary to wait one inst. 
/between each read from “stat. 


jcp loop 


Bits 15 


POL 


14 
RSVD* 


13 
CB 


12:8 
SVAL 


7 


* Reserved: write zeros to these bits. 


SHORT 


END 


6 5 


TEST 


nee 
WRITE 


3 
RSVD* 


2:0 
Starting 
Stat-ram 
ADDRESS 


Figure 2-4. Statistical Decode CONTROL Register 


END mode is enabled by setting the END bit (bit 6) 
in the CONTROL register to a ONE. The SHORT 
mode is enabled by setting the SHORT bit (bit 7) in 
the CONTROL register to a ONE. When in SHORT 
mode, the five SVAL bits (bits 12:8) in the CON- 
TROL register are used as the SHORT — 2X value. 


— The POL bit (bit 15) determines the polarity of the 
run-in sequence bits. If bit 15 = 0, then ONEs end- 
~ ing in ZERO (e.g., 1110xxx) sequence is selected. If 
bit 15= 1, the ZEROs ending in ONE (e.g., 0001 xxx) 
sequence is selected. 


~The CB bit (bit 13) allows circular buffers of sizes 
64 Kbytes, 128 Kbytes, or 256 Kbytes to be created 
in memory, as in the case of the input FIFO. The 
choice, of different sizes of buffers are determined 
by programming the least significant 3 bits of the 
circular buffer register (circbuf). To enable this fea- 
ture, the CB bit has to be set to a 1, then depending 
on the buffer size selected, the appropriate address 
pin that goes off chip will be forced to a O (register 
pointers remain unchanged). Table 2-3 shows the 
programming combination of the circular buffer 
register. | 


The decoding parameters may be changed between 
symbols by writing to the CONTROL register and, if 
necessary, writing new values into the code descrip- 
tion table. The correct procedure for changing the 
code type or decode mode is to read the last value 
from the decoder prior to the change, using “stat# 
instead of *stat. This keeps the decoder from auto- 
matically starting to decode the next symbol. At this 
point, the code description table and the SHORT 
_and END mode bits can be changed as desired. The 
next time the CONTROL register is written with both 
TEST = 0 and WRITE = 0, the decoder will begin 
to decode the next symbol using the new parame- 
ters. 


The statistical decoder buffers one quad word read 
from VRAM so that the decoding of bits in one 32-bit 


word and the fetch of the next 32-bit word may over- 
lap. As with the input and output FIFOs, the decoder 
has a VRAM pointer associated with it that points to 
the location in VRAM from which it is reading data. 
This pointer increments twice each time a new quad 
word is read; there is no decrement mode. When the 
least significant word of the decoder’s pointer (stat- 
/o) is written, any data that had previously been pre- - 
fetched from VRAM is ignored, and the decoder 
fetches one quad word starting from this new loca- 
tion. 


The 82750PB assumes that the statistically encoded 
bitstream in VRAM starts with the least significant bit 
of a double word. That is, the two LSBs of the ad- 
dress written to start-lo are ignored. 


The statistical decoder decodes data at a rate of 
one bit per T-cycle. To a first approximation, the de- 
code time for an N-bit symbol is: 


decode time (in T-cycles) = N + 1 


Since it takes at least 64 T-cycles to decode data 
from one quad word, which is the time required fo 


' eight quad word reads from VRAM, the decoder 
~ should rarely run out of data. Therefore, the above 


estimate should very accurately model the actual 


decoding rate of the statistical decoder. 
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The statistical decoder always begins to read the 
bitstream from the least significant bit of the double 
word found at the starting location in VRAM. That is, 
the decoder does not start on a byte or word bound- 
ary as an input FIFO or output FIFO does, but only 
on double word boundaries. The bitstream moves 
from the least significant bit to the most significant 
bit of a double word and then to the least significant 
bit of the next double word (at the next higher ad- 
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dress location). For the x-bits, the first x-bit. read 
from the bitstream becomes the most significant bit 
of the x-bit field when it is interpreted as a binary 
number. The example below shows a code defini- 
tion, a bitstream stored in VRAM, and the resulting 
decoded values. 


The code definition and range of values for each 
symbol length are indicated in Table 2-9. 


Table 2-9. VRAM Bitstream Decode Values 


[Bymbol[Values] Comments 
op fol; 


110xx | 3-6 [11000 =3,...,11011 = 6 
1110xxx| 7-14 [1110000 = 7... 1110111 = 14 


Decoding starts at address 0 in this example. The 
two double words at addresses 0 and 1 are: 


0: OxAC98E14D 
1: (0x372E74CB 


— The bitstream in VRAM, with S0l6KS dividing the 
symbols (read from right to left starting at LSB of 
address 0) is shown in Figure 2-5. 


Table 2-10 lists the symbols, in the order they are 
encountered in the bitstream, and the corresponalng 
decoded values. 


Wadiess MSB <——_—. Read bitstream from LSB to MSB <—_ 


0 


: 


a ae 


| 11010 


| “1110100 


a mies 
1. 
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Table 2-10. Decoding Symbols*-->-- 


Symbol | Value | |= Comments 
Starts at LSB, 
Address 0, 

Scanning Left 


“11001 
1110011 


1110110 


USB ae 
Start 


1: 01011: 001:001: 1000111: 0:0: 0: 0: 101: 001: 101 Hore vs 


First bit of a symbol continued at LSB of next double word 


1 O: 110111:0:0: 101: 1100111: 0: 10011:001011 


Figure 2-5. VRAM Bitstream Decoding Addresses 
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240854-5 


240854-6 | 


Figure 2-6. Pixel Interpolation 


Pixel interpolator 


{Pixint-c, Pixint} 


The pixel interpolator performs bilinear interpolation 
on four 8-bit pixels to generate, in effect, a pixel 


shifted by a fraction of a pixel position. See Figure | 


_ 2-6. If the four pixels have values of A, B, C, and D; 
and the horizontal weight and vertical weight are h 
and v, respectively, the interpolated value W, ignor- 
ing any quantization effects, is given by: — 


W = A*(1—h)(1—v) + B*h(1 —v) + C*(1 —h)v + D*hv 


_ The values of h and v are even multiples of 1/16. 
Figure 2-6 illustrates pixel interpolation with an h 
weight of 6/16 or 3/8 and a v weight of 10/16 or 
5/8. | 


The pixel interpolar can operate in two modes: se- 
quential-2D and random-2D. Sequential-2D mode is 
used for motion video decoding and when an array 
of pixels are interpolated with a common weighting. 
Random-2D mode is used either when the pixel ar- 
rays to be interpolated are not adjacent pixels in two 
rows or when the weight is changed for each inter- 
polation. (The word random is used here to mean 
non-sequential.) 
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The example in Figure 2-7 shows a single row of 
pixels being interpolated in Sequential-2D mode us- 
ing two rows from the original (Source) bitmap. The h 
and v weighting are constant for all the interpolated 
pixels. In this case, the weights appear to be approx- 
imately h = 10/16 and v = 6/16. 


—First Input Row 


—Interpolated Row 
—Second Input Row 


Figure 2-7. Sequential-2D Pixel Interpolation 


The pixel interpolator is pipelined and requires some 
startup sequence to fill the pipeline. Once filled, the 
pixel interpolator generates a new interpolated pixel 
every two T-cycles when in Sequential-2D mode. 
Source pixels are written into the interpolator as pix- 
el pairs. In the case above, the pixel pair BA would 
be written first, followed by the pixel pair DC. It would 
seem more natural to refer to the pixel pair as AB, 
but because of the way 8-bit pixels are arranged in 
16-bit words in VRAM, the left-most pixel on the 
screen is the least significant byte position. For ex- 
ample, if pixel A had a hex value of OxAA and B had 
a value of OxBB, the 16-bit word containing pixels A 
and B would have a value of OxBBAA. | 


Then, two pixels are read from the interpolator. Be- 
cause the pipeline isn’t full yet, these pixels are read 
and discarded. This loop of writing two pixel pairs 
and reading two output pixels continues four times. 
The two pixels that are read this fourth time are the 
first two valid output pixels: W and X. The interpola- 
tor may also collect output (interpolated) pixels into 
pixel pairs. For exmple, pixels W and X, instead of 
being output separately, would be combined into a 
16-bit pixel pair XW. Since there are two possible 
phase relationships between the input pixel pairs 
and output pixel pairs, the desired phasing (either X 
and W paired or Y and X paired) can be specified. 


15 44 13 12 
"“RESERVED—Write.as ZERO 
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10 


‘Pipelining Select (1 = Fast, 0 = Standard) 
"Phase (0 = In Phase, 1 = Opposite Phase) 
“RESERVED—Write as ZERO 


Mode Select Bits 


Figure 2-8. Pixel Interpolator Control Register 


Random-2D interpolation is used either when the 
pixels to be interpolated are not in horizontal rows or 
when the weight is changed for each interpolated 
pixel. Examples for this are smooth warping or 
smooth scaling operations. In the case of Random- 
2D, the processing for successive interpolated pix- 
els can not take advantage of pipelining; each pixel 
is considered to be the first pixel of a Sequential 


mode interpolation. The weight and the two input 


pixel-pairs are written into the interpolator. After 
waiting at least 10 T-cycles, the one interpolated pix- 
el can be read. (The delay is 10 cycles when in the 
standard mode (bit 14 = 0) and 6 T-cycles when in 
the fast mode (bit 14 = 1).) Then, the next two input 
pixel-pairs and if necessary, the new weight value, 
are written, and 10 cycles later the next interpolated 
pixel can be read. 


The h and v weight values, the mode selection, and 
other control bits are written to the pixel interpolator 
conirol register (avg-c). The bit assignment for this 


register is in Figure 2-8. The least significant byte . 


holds the 4-bit v value (bits 7:4) and the 4-bit h value 
(bits 3: 0). : 


NOTE: 


The values used for h and.v here are numerators 


Of the fraction where the implied denominator is 
16. 


MODE SELECT 


Bits 8 and 9 are used to select on of four operating 
modes, of which only two are presently defined. 
These modes are given in Table 2-11. 


| Table 2-11. Mode Select Operating Modes 


00 ~RANDOM-2D 


‘Pairing (1 = Output Pixel Pairs, 0 = 
‘Reset Bit (1 = Reset, 0 = Normal) 


Bits 9:8 
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Single Pixels) 


Vertical Weight - - - -" 
Horizontal Weight 


RESET 


Writing a ONE to bit 10 resets the pixel interpolator. 
The pixel interpolator must be reset piel. to chang- 
ing modes. | 


PAIRING - 


A ZERO in bit 11 causes the pixel interpolator to 
output individual pixels. A ONE causes the interpola- 


‘tor to collect adjacent pixels (in Sequential-2D 


mode) into 16-bit pixel pairs. This feature assists in | 


-motion video decoding, when combined with the 


ALU’s dual-add-with-saturate operation, by allowing 
two pixels to be processed each cycle. The phasing 
used in collecting the pixel pairs is determined by the 
Phase bit described below. — 


PHASE 


When output pixels are collected into pixel pairs, 


_ there are two possible alignments of the input pixel 


pairs to the output pixel pairs. The Phase bit (bit 13) 
selects the alignment to be used, based on the rela- 
tive word alignment of the source and destination 
bitmaps in VRAM. When the Phase bit is set to a 
ZERO, this indicates that the bitmaps are in-phase. 
In this case, the first two output pixels are grouped 


into. one 16-bit pixel pair (with the first pixel in the 
_. least significant byte). When the Phase bit is set to a 


ONE, the bitmaps are out-of-phase. In this case, the 
first pixel is placed in the most significant byte of the. 
first pixel pair, with invalid data in the least significant 
byte, and the second and third output pixels are col- 
lected into the second pixel pair. This | is illustrated in — 
Figure 2-9. | 


PIPELINING 


A ZERO in bit 14 causes the pixel interpolator to use 
the standard amount of pipeline delay. A ONE in this 
field will select the fast mode that has less pipeline 
delay. Table 2-12 shows the pipelining delay for both 
modes. Note that the effect of the phase bit is to add 
an extra PES; delay. 


Out-of-Phase: 
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~ 1st Row of Input Pixels Pairs 
Output Pixel Pairs 


2nd Row of Input Pixel Pairs 


1st Row of Input Pixels Pairs 
Output Pixel Pairs 


2nd Row of Input Pixel Pairs 


Figure 2-9. Pixel Pair Phases 


Table 2-12. Pipelining Delay for 
Sequential-2D NON-PAIR Mode 
Pipelining 
Bit 
(Bit 14) 


Phase 
Bit 
(Bit 13) 


Pipeline Delay 
in Output 
Pixels 


When in PAIR mode (with bit 11 = one), the amount 
of pixel delay does not change, but half as many 
reads and writes are required to fill the pipeline be- 
cause each read or write of the averager transfers 
two pixels. For example, when in the standard mode 
(bit 14 = 0), with zero phase (bit 13 = 0) and pair 


mode (bit 11 1), three indeterminate pixel pairs 
must be read before the first good pixel pair is read. 
In the same case but with the phase bit = 1, the 
fourth pixel pair read contains one good pixel and 
one indeterminate pixel, and the fifth pixel pair read 
contains two good pixels. | 


RESERVED 


Bits 15 and 12 are reserved for future use. Write 
ZEROs into these bit positions. 


Signature Register 
| { hwic} 


The signature register can be read either by the host 
CPU or by microcode to determine the version of the 
82750PB. The value of the signature register can be 
used to distinguish between the 82750PB in the 


82750PA emulation mode, and the 82750PB in na- 
tive mode. The currently defined signature values 
given in Table 2-13. 


Table 2-13. Signature Values 


OxFFFE | The 82750PB Emulating the 82750PA 
OXFFFC | The 82750PB in Native Mode 


All other signature values are presently undefined 
but may be used in the future to denote other ver- 
sions of the 82750 architecture. | 


Display Format Registers 
{yeven, yoda, vu, vptr} 


The 82750PB’s processor can write to the display 
registers in the VRAM interface. These registers are 
pointers and pitch values that address display bit- 
maps and 82750DB register loads in VRAM. Point- 
ers are 32-bit values that specify the specify the 
starting byte address of a bitmap or register load 
within a 4 GByte address space. The bottom two 
address bits are ignored since display bitmaps and 
register loads must start on a double word boundary. 
Therefore, the internal representation of a pointer is 
a 30-bit value. The pitch value associated with each 
pointer indicates the number of bytes between the 
start of two lines of a display bitmap or between the 
start of two register loads. The pitch is a single 16-bit 


_ value with its two least significant bits ignored, since 
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the pitch must be an integer number of double 


words. Currently, there is also a restriction in the 
82750DB limiting all display bitmap pitches to pow- 
ers of two; so, the maximum display bitmap pitch is 
+214 Bytes = +16 kBytes. The display registers 
are described in Table 2-14. 
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Table 2-14. Display Registers 


yeven- -lo, hi This register pair points to the start of the Y bitmap or main bitmap that 
is to be displayed during an even field scan. | 

yodd-lo, hi This register pair points to the start of the Y bitmap or main bitmap that 
is to be displayed during the odd field scan. 

ypitch The value in this register is added to the current Y bitmap pointer value 

each time a Y transfer is performed. 

vu-lo, hi This register pair points to the start of the VU SIRTaD. This bitmap is 
read to generate the VU values for both odd and even field scans. 


vupitch — This value is added to the current VU bitmap pointer value each time a 
| VU transfer is performed. 
are hi . This register pair points to the start of a series of 82750DB seuisiak 
loads stored in VRAM. 


vpitch This value is added to the current 82750DB register load pointer each 
time a 82750DB register load is performed. The pitch is equal to the 
| | number of bytes from the start of one register load to the start of the 
| next edie load. | 


30 HARDWARE INTERFACE © Arbitrates VRAM accesses between the two input 
| | c | FIFOs, the two output FIFOs, the statistical de- 


coder, the transfer request logic, the VRAM re- 
VRAM Interface | | — | fresh logic, and the external VRAM access logic. 


© During a memory cycle, performs appropriate ad- 
dress arithmetic on the VRAM pointer used for 
that memory cycle. 


As a result of certain VBUS codes, performs a 
shadow copy that consists of copying display-re- 
lated VRAM pointer values from shadow registers 
(that are loaded by the host CPU or the micro- 


The VRAM interface performs the ono opera- 
tions: | 


e ‘Maintains VRAM pointers for the two input FIFOs, oy 
the two output FIFOs, the statistical decoder, the 
Y (main) bitmap, the VU bitmap, and the 
82750DB register load. © 


e Decodes VBUS codes and takes appropriate ac- code processor) to working registers where the 
tions such as generating a transfer cycle, sched- various pointers are used for transfer cycles 
.uling refresh cycles, or generating interrupt condi- when the 82750DB is refreshing the display 
tions. screen. 


Table 3-1. VRAM Interface Sams | 


a 


MREQ# — MEMORY REQUEST is asserted during the first cycle of a VRAM 
memory access. 
TRNFER # The TRANSFER output indicates the current MSmOny en is a result 
of a 82750DB transfer request. | 
RFSM# The REFRESH output indicates the current memory cycle is a result of 
a 82750DB refresh request. 


NXTFST # The NEXT FAST output indicates the next memory access will use the _ 
same row address as the current memory access. This facilitates the 
MRDY # 


_ use of page mode memory accesses. 


The MEMORY READY input indicates the avalibilty of valid data on 
the D[31:0] pins. 
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VRAM ACCESSES 


The 82750PB can initiate five different types of 
memory accesses: FIFO read, FIFO write, transfer 
read, transfer write, and refresh. In addition, the 
82750PB supports VRAM accesses by external log- 
ic. During an external access VRAM cycle, the 
82750PB tri-states its VRAM address and data bus- 
es and performs a host VRAM read or host VRAM 
write cycle. There is another operation performed by 
the 82750PB, a shadow copy, that is not a VRAM 


82750PB 


definition signals, TRNFR#, RFSH#, and WE#, are 
asserted at the start of T1 and remain asserted until 
the end of the last T2. Other VRAM operations can 


_ be described similarly by sequences of T-states. Re- 


cycle but is arbitrated as though it were, since no. 


VRAM cycles can take place during a shadow copy. 


The seven types of VRAM cycles initiated by the 
82750PB, including host VRAM read and host 
VRAM write, begin with the 82750PB asserting a 
combination of its three VRAM cycle definition out- 
puts: TRNFR#, RFSH#, and WE#. External logic 
detects the state of these signals, validated by 
MREQ#, and produces the appropriate sequence of 
VRAM control signals (RAS, CAS, etc.) to perform 
the type of memory cycle the 82750PB has request- 
ed. The 82750PB requires that each of these VRAM 
cycles take a minimum of two T-cycles, or T-states, 
denoted T1 and T2. External logic can insert addi- 
tional T2 states in order to stretch the VRAM cycle 
to more than two T-cycles. The start of anew VRAM 
access cycle is signaled by the assertion of MREQ# 
for the first T-cycle, T1. The VRAM access cycle 


FIFO ACCESS 


( 
MEMORY NOT 


READY Ci) 
\ 
\ 
\ 
N\ 
» o 


TRANSFER 


R 
EFRESH TRANSFER 


-CYCLE 


REFRESH 
CYCLE 


TRANSFERS 
~~ 


| 72, TF2 | Last State of a VRAM FIFO Cycle 


TSC The T-State required to perform a 
shadow copy 


fer to Figure 3-4 and 3-5 on page 42 for timing dia- 
grams. 


Table 3-2 defines the states used for all VRAM ac- 
cess operations. A state diagram for the VRAM/ 
Host Interface is provided in Figure 3-1. This dia- 
gram includes the FIFO access states 


Table 3-2. 82750PB VRAM Access States 


Idle State, No VRAM Activity 
T1, TF1 First State of aVRAM FIFO Cycle. 


HOST ACCESS 


ADDRESS 
NOT VALID 


XTERNA . is 
EXTE VRAM (wv) 
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REQUEST 


REFRES 


REFRESH 
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Figure 3-1. Access State Diagram 
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Note that during successive VRAM cycles it is not 
necessary to go back to the idle state, Ti, between 
each cycle; the Tro state can be followed directly by 


a T1 state, starting at the next VRAM cycle. This — 


results in efficient utilization of the 82750PB/VRAM 
bandwidth by allowing a VRAM cycle time of 2 
T-states. 


FAST VRAM CYCLES 


- When the 82750PB performs Data Read or Data 
Write VRAM cycles for the input or output FIFOs, it 
performs two 32-bit accesses to read or write one 


- 64-bit value. These accesses are always performed 


in a sequence of EvenAddress followed by EvenAd- 
‘dress + 1, which guarantees both that the two se- 
quential accesses will be in opposite banks and that 
the two accesses will be within the same VRAM 
page. This allows external logic to use either bank- 
interleaving or a page-mode access to complete the 
second. access of the sequence and improve the 
_VRAM bandwidth. However, the second access 
does not need to be handled differently from the 
first. Except for the assertion of the NXTFST# sig- 
nal, both accesses are treated as standard VRAM 


-accesses. External logic can ignore the NXTFST # © 


signal, though, and treat the two accesses as two 
normal data read or data write cycles. Note that 
NXTFST# is not asserted for transfer, refresh, or 
host memory accesses. | 


82750PB 


The NXTFST# output signal is provided for cases 
when external logic can generate a faster access for 
the second access of the two sequential accesses. 


During such a pair of accesses, NXTFST # is assert- 


ed during the first of the two accesses in order to 
provide sufficient time for the external logic to gener- 
ate the appropriate fast memory cycle for the sec- 
ond access. Refer to the timing diagrams in Figures 
3-4 and 3-5. (page 42) for examples. musuating the 


use of the NXTFST# “signal 


VBUS CODES . 


Transfer request, interrupt, and synchronization 
codes are sent over the BUS from the 82750DB to 
the 82750PB. The codes ‘recognized by the 
82750PB are listed in Table 3-3, along with the ac- 
tions taken by the 82750PB as a result of receiving 
each code. Codes that cause TRANSFER cycles 
must be asserted for at least two clock cycles of the 
82750PB to insure. that, in the worst case, the 
82750PB completes the transfer cycle before the 
code is released and the 82750DB starts shifting 
data from the VRAM shift registers. Other codes 
must also be asserted for a minimum of two 


. 82750PB clock cycles. Only the codes given in the 
Table 3-3 are valid codes for the VBUS. Other codes 


are reserved for future use and should not be used. 
Once a transfer cycle code is sent to the 82750PB, 


“any non-transfer code may be sent immediately. A 
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subsequent transfer cycle code should be sent only 
after the current transfer cycle is completed. 
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Table 3-3. VBUS Codes 


ee 


VBI Int; OF Int; Shadow Copy Odd; Hline = 0*** 


1101 VEVEN VBI Int; EF Int; Shadow Copy Even 


HLINE | lcnt+ + (Increment Line Counter) 
NULL 
NOTES: 


*Yc—Y bitmap pointer, current; Yp—Y bitmap pitch; VU—VU bitmap; V—82750DB register load. 
**Shadow Copy with Yc = Y-start-odd in odd field; Yc = Y-start-even in even field. 
***Hline—Horizontal Line Counter. 


PRIORITY gle REFRESH code from the 82750DB schedules a 

7 number of refresh cycles, a higher priority for refresh 
_ Each time the VRAM state machine completes a would cause all the refresh cycles to occur in a burst 
VRAM operation and returns to the Ti state, it exam- that would lock out all lower priority requests until all 
ines all pending VRAM access requests and selects refresh cycles completed. Instead, the following 
the highest priority request for the next VRAM oper- restriction applies to all request types with higher 


ation. The priority ordering of these requests are list- priority than refresh: high priority requests, such as 
ed in Table 3-4. transfer cycles, shadow copies, and external VRAM 
access must occur infrequently enough to allow 

Table 3-4. Priority of VRAM Operations _ proper refresh of the VRAM chips. Transfer cycles 


and shadow copies, by their nature, occur infre- 
quently so they are not generally a problem. 


[shadow Copy if 
[Host Access fe 


There is a separate priority scheme for the five FIFO 
channels. The scheme used is rotating priority with 
automatic override and single cycle arbitration. Ro- 
tating priority means that the priority is assigned in a 
fixed cyclic order with the lowest priority given to the 


VRAMRefresh | 
FIFO Read/Write FIFO channel that “won” the last FIFO access. 
: , There is only one level of memory , so the order that 


NOTE: requests arrive is not a factor in the arbitration. The 


‘The shadow copy is treated as a VRAM operation even __ Cyclic order is given in Figure 3-2. | 
though it does not result in an access to VRAM. 


| As an example, if input FIFO 0 (abbreviated if0) was 
The VRAM refresh operation is placed low on the the last channel to perform a cycle, the priority order 
priority list to reduce the latency in servicing transfer for the next FIFO access (from highest to lowest) 
requests and external VRAM requests. Since a sin- would be: if1, sd, of0, of1, and if. 
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Automatic override that the rotating cyclic priority 


can be bypassed if there is an URGENT condition 
for one of the channels. A channel is urgent if the 


microcode processor is frozen because the proces- . 


sor is waiting for that channel to.be ready. The chan- 
nel can be either an input channel that is empty or 
an output channel that is full. In this case, the urgent 
channel gets the next available cycle. However, the 
priority will still be lower than non-FIFO penupels 
such as refresh cycles. | 


_ Single clock cycle arbitration means that the selec- 

tion of the next channel that will get an access oc- 
curs in a single T-cycle or T-state, either in a Ti state 
or during the last T2 state of the previous VRAM 
ieee 


VRAM POINTERS 


The VRAM interface maintains VRAM pointers for 
the FIFOs, as well as display-related pointers for the 
82750DB. 
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lf a VRAM pointer appears on the B-Bus as source 


or as a destination then the following rules apply: 


Rule 1 


lf a B-Bus destination refers to an address that is 
both Even and >Oxif, then the source is restricted 
to ‘-lo” ‘Peni if the source refers to a polnter, 


~ Rule 2 


Internally each pointer or address is 


stored as a 30-bit value addressing a double word in| 


VRAM. The pointer values are read and written as 


two 16-bit words representing a 32-bit byte address. 
‘(refer to the Figure 3-3). With a 30-bit double word | 
address, the 82750PB can decode a VRAM address — 


space of 1G double words or 4 GBytes. 


Input and output FIFOs can address down to a sin- © 


gle word or byte in VRAM. A FIFO’s pointer is post- 


incremented or post-decremented in parallel with its 
VRAM read or write cycle. 


The statistical decoder can only start decoding bit- - 


streams on double word boundaries in VRAM and 
can only increment through VRAM. The -decoder’s 
pointer is post-incremented in parallel with ean of 
its VRAM read cycles. | 


Display- related pointers are updated by adding a 
pitch value to the current value tae the corre- 
eonding transfer cycle. 


If a B-Bus destination refers to an address that is 
both Odd and > Oxf, then the source is restricted to 
“-hi’” pointers if the source refers to a pointer. 


SHADOW COPY 


When a VODD, VEVEN, or DFL code is received 
from the 82750DB over the VBUS, a shadow copy is 
scheduled. The actual shadow copy will occur as 
soon as the priority logic allows. Any VRAM access 
in progress must complete and a pending transfer 
cycle, if any, must be performed before the shadow _ 
copy can start. During the operation, shadow regis- 
ters for the Y-START, Y-PITCH, VU-START, VU- 
PITCH, 82750DB-START, and 82750DB-PITCH are 
copied into the corresponding working registers. 
During display refresh, the address arithmetic is per- 


_ formed on the working registers. The shadow regis- 


ters can be loaded by the host CPU or by a micro- 
code routine with less critical timing constraints, and 
then copied instantly by a shadow copy with it is time 
to update the registers, either prior to the next field 


or during the active display for. split screen effects. 


— > inFIFO1 — > inFIFOO — > outFIFO1 —> outFIFO 0 — > Statistical Decoder 


Figure 3-2. Cyclic Ordering of FIFOs 


31 30 29 ig 
VRAM Address 
Byte Address within Double-Word 


240854-8 


| b See Most Sig, ‘Word of VRAM Address. -> 
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Figure 3- 3. VRAM Addressing 
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There are actually two shadow registers for Y-. 


START. One for start of odd fields and one for start 
of even fields. A VODD code causes Y-START-ODD 
to be copied into the working register Y-CURRENT. 
Similarly, a VEVEN code causes the Y-START- 
EVEN to be copied into Y-CURRENT. A DFL code 
causes the Y-START-ODD value to be copied if the 
most recent start of field code received is a VODD, 
or a Y-START-EVEN value if the most recent start of 
field code was a VEVEN. This allows a simple inter- 
laced or non-interleaced display to be refreshed with 
‘no host CPU intervention. For more complex dis- 


plays, such as split screens, the host CPU must up- © 


date the shadow registers prior to each shadow 
copy. A shadow copy operation requires 2 T-cycles. 
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. Host Interface 


The Host Interface provides the following functions: 


-@ Arbitrates host CPU and 82750PB access to 


VRAM. 
© Provides the host access to external devices. . 


© Provides the host access to 82750PB internal 
registers and memories. 


Signals specific to the Host Interface are listed in 
Table 3-5. 


Table 3-5. Host Interface Signals 


HRAM # 
concert with HREG #. 


HALEN # 


| HBUSEN# 


| 
| HREQ# HOST REQUEST: Asynchronous request from the host for all types of 
: host access. Used both to request and release system buses. 
HREG # HOST REGISTER: Single-ranked control to request host access to 
82750PB internal registers in concert with HRAM #. 


HOST VRAM: Single-ranked control to request host access to VRAM in 


HOST ADDRESS LATCH ENABLE: Asynchronous status from the host 
indicating the presence of valid address, write enable (transaction 
direction control), and the byte enables at the interface of the 82750PB. 


HOST BUS ENABLE: 82750PB synchronous status granting the host 
access to the address, write enable, data bus, and byte enables at the 
interface of the 82750PB. | 


HOST READY: 82750PB synchronous status to the host indicating the 
presence of valid data appearing at the 82750PB’s databus for VRAM 
and register accesses and optionally for external accesses. 


HINT # HOST INTERRUPT: 82750PB synchronous interrupt to the host, set 
under direct or indirect microprogram control. | 


Signals common to the host, VRAM, and external device interfaces are listed in Table 3-6. 


Table 3-6. Host, VRAM, and External Device Interfaces 


BE[3:0] # 


en 


ADDRESS BUS: System address bus used to select unique VRAM, the 
82750PB register, and external device locations that will be accessed | 
under host control. The lower seven bits A[8:2] are bidirectional and are 
used during register accesses 


DATA BUS: Bidirectional system data bus used to transfer data to and 
from all sources and destinations. When transferring 16-bit host register 
values, the data bus MSH and LSH will both carry identical values. 


WRITE ENABLE: Bidirectional, single-ranked signal used to determine 
the data transfer direction. When active during host register cycles, data 

flows from the host to an 82750PB destination. During host VRAM cycles, 
WE # active will define the data direction to be from the host to VRAM. 


BYTE ENABLE: Bidirectional signals used to select the bytes that will be 
modified during data transactions. All host register transactions are 
performed 16 bits at a time, while VRAM may be modified 8 bits at a time. 


ene a | 
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As with VRAM operations, host operations are described through a sequence of T-states. Table 3-7 defines 
the T- slates used to implement all host transactions with VRAM, external Gevices, and the 82750PB. 


The master avedition state. diagram that defines the VRAM/ Host transactions iS. provided in riguie: 3-1. 


Table 3- 7. 82750PB Host Transaction States | 


‘Description 


: First state of any host transaction. Entry into TA will be granted after 
HREQ# has been asserted. During this state, the 82750PB will tri-state 
its address, data bus, write enable, and byte enable signals to provide a 
full cycle of ‘“dead-band” before the assertion of HBUSEN #. In the state 
immediately following TA HBUSEN # will assert, allowing the host to drive 


the host buses. 


First cycle in which the host is granted bus access for register or VRAM 
transactions. The sequencer will remain in TB until HALEN # is received, 
indicating that the address write enable and byte enable signals are 


stable at the 82750PB pins. 


‘First cycle that output data is valid. 


This state is entered to wait for the completion of the current host cycle. 
The cycle is defined as complete when HREQ# deasserts. HRDY # is 
asserted along with valid data until the transition to state TD occurs. 


The last cycle of a host transaction. HBUSEN # is deasserted allowing - 
one dead-band cycle to allow control of the address, data, write enable, 
and byte enable signals to be returned to the 82750PB. 


| First cycle of a Host VRAM transaction. Memory! is requested and is | 
followed by a transition to TV2. 


Last cycle of a Host VRAM transaction. The seal will remain in TV2 
until MRDY # is received. = ; 


A single stage of. input Syrehionmaion is employed 


for HREG#, HRAM#, WE#, and BE[0]#, while _ 


HREQ# and HALEN# are programmable to have 
‘one or two stages by bit 12 of the Microcode Proc- 
essor Control Register. See Table 3-10. T-state tran- 
sitions are caused by the synchronized versions of 
these signals. 


The synchronized versions of HREG# and HRAM# 
must be stable before entry into T-state TA. The 
synchronized versions of WE#, BE[0]#, and 


HALEN# should be stable before exiting T-State | 


TB. Once asserted, all of the above signals should 
remain stable until the deassertion of HBUSEN#. 


The type of host cycle to perform is determined by | 


the states of HREG# and HRAM# as indicated in 
Table 3-8. 


Table 3-8. Host Cycle Types 


HREG# | HRAM # 


_ Host st hag 
Type. | 


HOST REGISTER ACCESS 


- The host has access to the 82750PB’s internal reg- 


isters and memories to monitor and control the oper- 
ation of the microcode processor, provide a means 
of debugging microprogram routines, and to function 
as the primary test port for production testing. 


Register access is initiated by the host asserting 
HREQ#, HREG#, and HRAM# as shown in Table 
3-8 and in the timing diagrams on pages 42 through 


45. After the host has been granted bus access by 
- an active HBUSEN # in state TB, the address, write 
_ enable, and byte enables may be driven. After these 


signals have stabilized HALEN# is asserted, en- 
abling a read or a write operation to occur. 
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In the case of a register read, state TC1 is entered 
and the data bus is driven with the internal value. 
One cycle later, a transition to state TC occurs, and 
HRDY # activates, signaling the presence of stabi- 
lized data at the 82750PB data pins. This state (TC) 
will be maintained until the host deasserts HREQ#, 
signaling the completion of the cycle that caused a 
transition to state TD. 


In the case of a register write, TC1 is again entered 
(from TB), but the data bus may now be driven by 
the host. (During host cycles, data bus drive activity 
is indirectly controlled by WE# and an additional 
dead-band is provided by entry into state TC1 to al- 
low for internal WE# stabilization.) Stable data at 
the 82750PB interface, as well as the completion of 
the write cycle, is signaled by the deassertion of 
HREQ#. As with reads, the deactivation of HRDY # 
signals-the transition to state TD. 


As state TD is entered, HRDY# and HBUSEN# 
deassert, the address data, write enable, and byte 
enables tri-state, and bus control is ieiuiaed to the 
82750PB in the owns ele: | 


HOST VRAM ACCESS 


Because the 82750PB is so closely coupled with 
VRAM, host accesses to VRAM are arbitrated and 
controlled by the 82750PB. VRAM access is initiated 
by the host asserting HREQ#, HREG#, and 
HRAM# as shown in the Host Cycle Table above 
and in the timing diagrams on pages 42 through 45. 
After the host has been granted bus access by an 
active HBUSEN#, the address, write enable, and 
byte enables may then be driven. After these signals 
have stabilized at the memory devices (or longest 
relevant propagation path), HALEN# is asserted, 
enabling a read or a write operation to occur. 


Because VRAM will not drive the data bus until after 
a memory request, a transition into state TC1 to al- 
low for data bus direction stabilization is not re- 
quired. Instead, a transition to state TV1 occurs, 
which asserts MREQ# for a single cycle and is fol- 
lowed by a transition to TV2. TV2 will remain the 
current state until the reception of an active 
MRDY #. 


In the case of a VRAM read, the memory data bus 
will be driven during TV1, and valid data will appear 
in state TV2. Data will be guaranteed valid coinci- 
dent with the deassertion of MRDY # from memory. 


In the case of a VRAM write, the memory data bus is 
driven with valid data during TV1. Again the recep- 
tion of MRDY # will serve to indicate the completion 
of the memory operation. 
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NOTE: 
The host device must be able to transmit or receive 
memory data in order to be valid at the trailing 
edge of MRDY # at the data’s destination (memory 
or host). 


After MRDY # becomes active, a transition from TV2 
into TC1 is accomplished to allow time to propagate 
data to the host. TC is then entered to await the 
deassertion of HREQ# (if it has not already oc- 
curred). TD is then entered, duplicating the dead- 
banding previously described. 


HOST EXTERNAL ACCESS 


In addition to VRAM and register host access, an 
external device access mechanism Is provided. Dur- 
ing this access, upon the receipt of HREQ# with 
HREQ# and HRAM # inactive, the 82750PB releas- 


es the address, data, write enable, and byte enables 


in state TA. 


The difference here is that state TC1 is directly en- 
tered from TA, thereby. ignoring any transitions of 
HALEN #. Since the 82750PB also ignores the data 
bus direction control (write enable) the host and an 


external device may communicate unencumbered 


by the 82750PB. 


Entry into state TC directly follows TC1 in the ex- 
pected sequence and remains there until HREQ# is 
released. This is followed by entry into TD. 
HBUSEN # is asserted during the timing that TC1 
and TCN are active. 


During an external access, HRDY # is not asserted 
unless the external logic asserts MRDY # as shown 
in Figure 3-7. 


HOST REGISTER ADDRESS MAPPING 


Table 3-9 shows the host address mapping of the 
on-chip registers and memories, in terms of the off- 
set in bytes, from the base address for 82750PB 
accesses. Note that the 82750PB only supports 
word accesses to these registers. Therefore, the 


_least significant bit of the byte offset should be set to. 


1-89 


zero. The 82750PB forms the register address from 
inputs on the A[31:2] pins and BE#[3:0] pins. The 
A[31:2] specify the double word address of the reg- 
ister, and combinations of the BE# pins determine 
which of the two words with the double word is being 
addressed. BE #[3:0] = 1100 selects the least sig- 
nificant word within a double word, and BE # [3:0] = 
00115 selects the most significant word within a 
double word. These are the only two valid patterns 
for BE# inputs during a host register access cycle. 


Table 3-9. Host Address Mapping 


Byte =|. 
Address 


0x000-—0x07E 


Description 


(a) A source and 
destination registers 

(b) B source and © 
destination registers 

(c) Microcode processor control 
and status registers 

(d) VRAM pointer RAM 


0x080-Ox0FE 
0x100-0x17E 
0x180-0x1FE 

NOTE: 


The host should only perform 16-bit word reads 

or writes to 82750PB registers. The 82750PB 
| does not support byte reads or writes or double 
| word reads or writes to on-chip registers. 


When the host CPU reads or writes to areas (a, b, or 
d) and the 82750PB is not already in a HALT state, 
the microcode processor is automatically HALTED 
for the one T-cycle actually required to complete the 
data transfer, and then the processor is restarted 
after the transfer is complete. If the 82750PB is ina 
HALT state when the host access is initiated, it will 
remain in the HALT state following the completion of 
the access. This is transparent to both the host CPU 
and the microcode epee seer, | 
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During an access to areas (a) or (b), bits 6:1 of the 
byte offset. should be set to the source or destina- 
tion code for the register that will be read or written. 
The coding is the same as used in the microcode 
instruction word. Bit O is always set to a zero. Refer 
to. the 82750PB Source and Destination Coding 
Table found in Chapter 4. 


Area (c) contains one write-only register, the CON- 
TROL register, and two read-only registers, the IN- 
TERRUPT FLAG register and the microcode PROC- 
ESSOR STATUS register. The CONTROL register is 
used to halt or single-step the microcode processor, 
which enables or masks interrupts to the host CPU, 
selects the signal that is output via the PMON/FRZ 
pin, and enables or disables the 82750PA emulation 
mode. The bit es aamnene for the CONTROL regis- 
ter are given in Table 3-10. 


During reset of the 82750PB, the HALT bit is set toa 
one, the six Interrupt Enable bits are reset to zero, 
the Disable SYNC bit is set to zero, the PMON/FRZ 
bit is set to zero (so that the FRZ signal is output), 
and the Enable 82750PB bit is reset to zero (so that | 
on reset, the 82750PB starts in a 82750PA emula- 
tion mode). 
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Table 3-10. Bit Assignments for Microcode Processor CONTROL 
Register {Write-Only, Byte Offset = 0x100} . 


| Bit | Name | Description — 
Bit O HALT 1 = Microcode Processor Halt 
| _- Q = Microcode Processor Run 
SINGLE-STEP 1 = Execute One Instruction and then Halt 
(Only when Already Halted, BitO = 1) 
: 0 = No Action 


Bit 2 Enable MCINT 1 = Enable Microcode Interrupts to Host CPU 
: 0 = Mask Microcode Interrupts 7 
Bit3 Enable VBI 1 = Enable Vertical Blanking Interrupt to Host CPU 
| O = Mask Vertical Blanking Interrupt 
Bit 4 Enable DFL 1 = Enable DFL Interrupt to Host CPU 
0 = Mask DFL Interrupt 
Bit 5 Enable SD 1 = Enable 82750DB Shutdown Interrupt to Host 
O = Mask SD Interrupt 
| Bit 6 Enable OF | 1 = Enable Odd Field Interrupt | 
: | | O = Mask OF Interrupt 
Bit 7 Enable EFI 1 = Enable Even Field Interrupt 
oe a 0 = -Mask EF Interrupt 


Bit 12 Disable SYNC 1 = Disable Synchronizers for HREQ #/HALEN # 
| a | _ 0 = Enable Synchronizers for HREQ#/HALEN# 
Bit 13 PMON/FRZ 1 = Output FRZ # Signal on PMFRZ # Pin 
0 = Output PMON # Signal on PMFRZ # Pin 
| Bit14 || 4 = RESERVED; Write as Zero 


Bit 15 Enable 82750PB |. 1 = Enable 82750PB Mode | 
O = Enable 82750PA Emulation Mode 


-*All other bits are reserved for future use, and ‘should be written as.zeros. 
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The INTERRUPT FLAG register holds a flag for The PROCESSOR STATUS register holds four 
each of the six interrupt sources. A flag bit is set toa status bits: HALT, FREEZE, PMON, and SYNC 


one when the interrupt condition is detected (inde- status. HALT indicates that the processor is HALT- 
pendent of the state of the corresponding Interrupt ED due to a HALT bit in the CONTROL register be- 
Enable/Mask bit in the CONTROL register), and all ing set to a ONE or due to the HALT# pin being 


flags are cleared to zero each time the INTERRUPT asserted. FREEZE indicates that the processor is 
FLAG register is read. If this register is read during waiting for one of the VRAM channels to become 
the same cycle that an interrupt condition is detect- ready or is waiting for an access to the VRAM point- 
ed, the flag bit corresponding to that interrupt condi- er RAM. PMON is a signal that can be toggled by a 
tion will remain at a one. This new interrupt condition special ALU opcode or a special B source code. 
will then be seen by the host processor when it next ‘This signal can be used for performance monitoring 
reads the INTERRUPT FLAG register. The flag in- . of microcode. SYNC status bit indicates the pres- 
sures that an interrupt is not lost if it occurs at the ence or absence of the internal synchronizers for 
same cycle that the INTERRUPT FLAG register is HREQ# and HALEN # inputs. In addition, the Inter- 
read (and reset). In addition, the Microcode Interrupt. rupt Mask bits that are written into the PROCESSOR 
source has an overflow flag that indicates if more CONTROL register can be read from this register. 
than one Microcode Interrupt has occurred since the These mask bits are read in the same polarity that 


Interrupt Flag register was last read. The bit assign- they are written, but note that the bit positions and © 
ments for the INTERRUPT FLAG register are listed bit ordering are not consistent with the PROCES- 
in Table 3- 11. : SOR CONTROL register. The bit assignments for 


this register are given in Table 3-12. 


Address mapping for areas (a), ©), and (d) are given 
in Tables 3-13 to 3-15. 


Table 3-11. Bit Assignments for INTERRUPT FLAG Register 
(Read-Only, Byte Offset = 0x100) 


a 


OF Interrupt Flag 
_ MCINT Overflow Flag _ 


Bit 15 DFL Display Format Load Interrupt 


1-92 


‘intel. | 82750PB 


Table 3-12. Bit Assignments for PROCESSOR STATUS Register 
(Read-Only, Byte Offset = 0x102) 


[et dSS~S™SCSCSCérption 
[sito (SALT = Hated, = Running) SSS 
[sits _|__Synetronizers on HREG#/HALEN (0 = Enablod 1 = Disabled) 


MCINT Microcode Interrupt Mask 
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Table 3-13. 82750PB A Bus Source/Destination Address Mapping | 


ADST 


a 
Po10 | 
Poors | 


| 0x022 


2 
ae 
[—ov0se | 
[—oroae 


maddr 


Icnt Icnt 


nt 
oO 
rt 

fee 
rm. 
r4 z 
r5 

r6 

r7 


cc 
ont 
fr 
re 
r4 
r5 
r6 
TT 
mcode3 
eae, 


mcode3 


mcode2 


pixint 
— *dramt1 
*dram2 
*dram1+ + 
*dram2+ + 
*dram1— — 


*dram1+ + 


dram1  dramt 
| dram 


*dram2— — 


dram2 
dram3 
— dram4 


*out1 


ASRC > 


Address (Hex) 
—— 0x042 
— 0x044 
 0x046 
~~" 0x048 
“ Ox04A 
— 0x04C 
— Ox04E 


ADST_. 
outi + + 
. Shift-hi. 
 outt-hi 
*out2 | 
©. Out2+ + 
shift-r 7 
out2-hi. 


out1-c 


*in2 
*stat 
*stat# 


—0x052 
—  Ox054 


int-c 
_ — Shift-h 
| int-hi . 
0x058 _ out2-c.. 
| Ox05A 
~  Qx05C 
~~ 0x05E 

0x060 


in2-c 


in2-hi | 


Ox06A 
14 r14 


~ Qx06C r 


= 
_ 


5 r15 
shift 
| font 


| 0x070 

— 0x072_ 
0x074 

| 0x076 
0x078 

| 0x07A 
0x07C 

: Ox07E 


?) 
QO 


font 


*dram3 
*dram4 
*dram3+ + 
*dram4+ + 


*dram4 
*dram3+ + 
*dram4+ + 
*dram3 — — 


*dram4— — *dram4— — 
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Table 3-14. 82750PB B Bus Source/Destination Address Mapping 


Address (Hex) | _ BDST Address (Hex) 

*dram3_  0x0C6 outt-hi- 

*dram4 — 0x0C8 stat-lo 

3 
r/ 


BSRC 


3 

: 

: 
7 


OxODC in2-lo 


2 


Ox0E8 yeven-lo 


. 
ro 
[01088 tera 


r8 

r9 

14 
Cc 
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7 Table 3-15. VRAM Pointer RAM Mapping | 


Working Copy of Y Pointer 


Byte Address 
0x180 | Yw-lo ’ | | 
0x182 Yw-hi | 
- 0x184 out1-lo Output FIFO 1 Pointer = | | 
0x186 outt-hi : 
-RESERVED oe 


0x188 Yw-pitth +} Working Copy of Y Pitch 
J °° 0x18C | > out2-lo Output FIFO 2 Pointer — 
{| Ox18E out2-hi | cae: 


Oxi90 . VUw-lo . Working Copy of VU Pointer | 
0x192 VUw-hi é: iy 
 Ox194 - int-lo Input FIFO 1 Pointer 
~~ 0x196 >. 3 int -hi | : | 
- Ox198 it” VUpitchw __, Working Copy of VU Pitch 7 
| Oxt9A | _ Working Copy of 82750DB Pitch | 


| | Input FIFO 2 Pointer — | 
| 0x1A0 vptw-lo sds ~ Working Copy of 82750DB Pointer sits 
| 0x1A2 vptw-hi sf. | a 
-0x1A4 | _ Stat-lo | Working Copy of Statistical Decoder Pointer 
— Ox1A6 |) stat-hi at le 7 _ ; 


Yeven-lo_- | 
Yeven-hi - 

“Ox1AC Yodd-lo 
Ox1AE a. Yodd-hi 


/ 


Shadow Copy of Y Start Even Pointer 


Shadow Copy of Y Start Odd Pointer 


0x1BO : Ypitch Shadow Copy of Y Pitch 


| 0x1B4_ ne en oo Shadow Copy of VU Start Pointer . : | 
0x1B6 — VWU-hi | ~ | ni 

| 0x1B8. | VUpiteh Shadow Copy of VU Pitch . = 
Ox1BC - fv ptrelo 7 
Ox1BE vptr-hi 


| ox1B2 RFSH Cycles per RFSH Code from 82750DB 
—  Ox1BA vpith =f Shadow Copy of 82750DB Pitch 
NOTE: Register rfont write only register and should never be read. 


_ Shadow Copy of 82750DB Pointer 


ee 7 _ leasing RESET #. This is referred to as the INITIAL 
initallaing me eer ra sa state. In the INITIAL state: tS 
The 82750PB is placed in a RESET state by assert- e The microcode processor is halted. 


ing RESET # for at least ten T-cycles. In the RESET ee . - ) 

state, which continues until RESET # is released, all ig es ss inter cis ae masked, and the interrupt 

of the 82750PB’s outputs are tri-stated for compati- Biches ale Ciealed: 

bility with board test requirements. | © The 82750PA/82750PB instruction format select 
' bit is set to the 82750PA. 


Proper initialization of the 82750PB requires that the e The VRAM interface is ready to service VRAM 


82750PB is held in a RESET state by keeping RE- requests; however, none of the VRAM pointers 
SET# active for at least 10 T-cycles, and then re- -are valid. | | | 
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© The number of refresh cycles that will be generat- 
ed each time a RFSH code is received from the 
82750DB is set to 14 cycles. 


e All bidirectional |/O pins are tristated. 


After the 82750PB has been initialized, i.e., placed in 
the INITIAL state, but prior to releasing the 
82750DB’s reset signal, the following operations 
must be performed: 


© Load the REFRESH-CYCLES-PER-LINE register 

_ with the appropriate value (the equation for the 
value is: VALUE = (2N — 1), where N is the num- 
ber of cycles; for example, 5 refresh cycles would 
result in VALUE = 29 — 1 = 3149 = 001F 46. 
The refresh register is 14 bits wide and the way it 
works is to generate one refresh everytime a right 
shift results in a '1' bit. It continues the right sifting 
until it finds a '0' bit and halts. Hence from program- 
ming point of view: 001F16 = FFDF 16 = = 5 refresh 
cycles per line. 


Load the shadow copies of Y, VU, and 82750DB 
pointers:and pitches. 
Load the appropriate 82750DB Register Load list 
into VRAM starting at the address pointed to by 
the 82750DB pointer. 


° 


' Prior to releasing the microcode processor from its 


HALTed state to run a microcode program, the fol- _ 


lowing operations must be performed: 


© If 82750PB code is to be executed, bit 15 of the 
82750PB CONTROL register must be set to a 
one. 


® Load a microcode program into microcode RAM 
on the 82750PB by writing to the three instruction 
word registers (mcode7 — the most significant 
word. of the instruction, mcode2, and 
mcode3 — the least significant word of the in- 
struction, the one containing the next address 
field) and then writing to madar, the address in 
microcode RAM where the instruction will be 
loaded. 


® Load the PC with the address in microcode RAM 
of the first instruction to be executed. 


e Write to the 82750PB CONTROL register with the 
HALT bit (bit 0) set to zero, causing the processor 


to start executing an instruction sequence, or with 


the SINGLE-STEP bit (bit 1) set to a one (keeping 
HALT also set to one), causing the processor to 
execute a single instruction. 


Performance Monitoring 


~ Two signals, FRZ# and PMON#, which are useful 
for microcode performance monitoring, are available 
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both as external signals, multiplexed on a single out- 
put pin, and as bits in the Processor Status register. 
FRZ# is active for each T-cycle when the micro- 
code processor is frozen, waiting for access to 
VRAM or to the VRAM Pointer RAM. PMON# can 
be toggled by a special ALU opcode or a special B 
bus source code. This allows PMON # to be used to 
indicate what particular segment of microcode is be- 


' ing execute. The PMON/FRZ bit in the Processor 


Control register selects the signal that is being out- 
put. 


Freezes may indicate that the microcode routine is 
not making the most efficient use of the input and 
output FIFO buffering. This is particularly important 
for the inner loops of graphics and video routines 
that are memory-bandwidth limited. Ideally, inner 
loops should be balanced so that the rate pixels are 
processed is equal to the rate that they can be read 
from and written to VRAM with no freezes. The buff- 
ering in the input and output FIFOs serve to make 
sequential reads and writes to VRAM more efficient 
by performing full 64-bit reads and writes, instead of 
individual 8-bit or 16-bit accesses. This has the ef- 
fect of averaging the VRAM read/write rate over a 
number of instruction times. For example, if the 
82750PB is performing a 64-bit read or write every 8 
T-cycles, for an average of 8 bits per T-cycle, a two 
instruction inner loop could read one 8-bit pixel and 
write one 8-bit pixel without any freezes occurring 
(assuming the source pixels and the destination pix- 
els are each sequential). 


The PMON# provides a more standard performance 
monitoring capability by indicating when a particular 
segment of microcode, bracketed by special instruc- 
tions that toggle the PMON# signal, is being exe- 
cuted. This allows either absolute execution-time 
measurement or measurement of the fraction of the 
total-execution time that is required by the segment. 
Either the ALU opcode ‘prof’ or the B bus source 
code ‘prof’ will toggle the PMON signal. 


An external HALT pin is provided on the 82750PB to 
allow external debugging hardware to immediately 
halt the microcode processor. Activating this input 
causes the microcode processor to halt prior to exe- 
cuting the next instruction. When the processor is 
halted, the VRAM interface portion of the 82750PB 


- continues to operate normally, performing transfer 


cycles, refresh cycles, and shadow copies as re- 
quested by ine 82750DB. 


Host/VRAM Timing Diagrams 


Figures 3-4 through 3-8 are Host/VRAM Timing Dia- 
grams. | 


iNiei. , 82750PB 


CLK 
MREQ# 
A(31:3] 


A[2] 


BE#[3:0] 


WE# 
NXTFST# 
TRNFR# 
RFSH# i : PO 
pe (Trem soeX 


MRDY# 


to 82750PB 


827S0PB VRAM Write Cycle Pair 82750PB VRAM Read Cycle Pair 
_(zero wait states for both cycles) (first cycle has one wait state) 


240854-9 


NOTES: . 
1. Address pin A[2] is always ZERO for the first cycle of a cycle pair and ONE for the second cycle. 
2. The two cycles of a cycle pair are both writes or both reads. 


Figure 3-4 VRAM Read and Write —_ | 


CLK 


PITT Gitiit 
sacenecouemecses 


eee : eoeveceace 

Tr om 

s we eeetzecuse 

eoccesoene 

sosenecese! td 
eeonne ve 

aanocoeras, eeceseusse 


PO OD ee 


* MREO# 
A(31:2] 


BE#[3:0], 


WE# 


NXTEST# 


TRNFR# 


RFSH# 
D(31:0] 


_ MRDY# 


827SOPB VRAM Transfer Cycle 82750PB VRAM Refresh Cycle 
(Transfer Read or Write NOTE: the address is held 
depending on state of from pervious cycle, a 
WE# signal) ; refresh row address is NOT 
output by 82750PB; it is assumed 
that a CAS before RAS refresh 
cycle is generated to the 
DRAM/VRAM chips. 
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Figure 3-5. VRAM Transfer and Refresh Cycles 
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HREQ# 


HREG# WY AAAAAN | 7 


es a 
HALEN # ) : = SLIIELTEL TILL TT ET 
A[31:2] qe ieee 


BE4(3:0] 


WE# 
D[31:0] 


HRDY# 


WE# 
D[31:0] 


HRDY# 


© bidirectional signal - 
: ‘ driven. iby hast * 


240854-11 


NOTES: 
1. MREQ#, RFSH#, TRNFR#, and NXTFST# remain inactive during Host Register Read and Write cycles. 
2. If HALEN#/HREQ# synchronizers are disabled then the second Ti and Tb states will be missing. 


Figure 3-6. Host Register Read and Write Cycles 
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HREQ# E 
HREG# INI LLL LL 
HRAM# PILALLS SL 


HBUSEN# 
A(31:2] 
BE#[3:0] 


WE# 


D[31:0] 


HRDY# 


NOTES: 


apepenn 


2 ae Ge SS Ge ee Soe: XX X XN\X X X X xX 
MRDY# KX X X X X X X XXS EN FENN SX XX X XNX X X XX 


Note: HRDY# is only asserted by 82750PB if 
external logic asserts MRDY#. If MRDY# 

is not asserted, HRDY stays inactive during 
an External cycle. 


1. MREQ#, RFSH# TRNFR#, and NXTFST# remain inactive during Host External Read and Write cycles. 


2. If the Synchronizer on HREQ# is disabled, then the second Ti state will be missing. 


tt 


€d0s228 


240854-12 | 


LOL-} 


S9ajOAQ 9314 pue Pedy INVHA ISOH ‘8-€ aunBi4 


Ti Ti 1 Ta Tbh Tv Tv2 Tey Tex Tex Td Ti 


al et eT ey TF 


HREQ# 


HREG# LSP LL SS A | ; } 
HRAM# Bees 2awar 


HBUSEN# a > | = 

HALEN # 
A(31:2] 

BE#[3:0] ; 
WE# 


D[31:0] 


MREQ# | | > 
/ \V/ \/ V/ \/ V/ \/ V/ V/ y, Vv, \V/ : \/ \/ V/ \/ \/ \/ V/ V/ \/ 
MRDY# X X X X X X X X X X X X Ni SX X X XNX XN X XK xX 


HRDY# 


Note: 82750PB will stay in Tb for the maximum of: 
1) one T-state, OR 


2) two T-states after VALEN# goes low. 


NOTES: : wei 
1. RFSH#, TRNFR#, and NXTFST# remain inactive during Host VRAM Read and Write cycles. 
2. If the Synchronizers on HREQ#/HALEN # is disabled, then the second Ti state will be missing. 
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4.0 MICROCODE INSTRUCTION 
FORMAT 


Overview 


The 82750PB executes two slightly different instruc- 
tion formats: one that is backward compatible with 
the 82750PA and another that allows full access to 
the microcode resources of the 82750PB. The 
82750PA/82750PB bit in the 82750PB processor 
~ control register determines which instruction format 
is in effect (see Chapter 3). On reset, the 82750PB is 
placed in 82750PA instruction format mode. In this 
mode the 82750PB will execute binary microcode 
Originally assembled for the 82750PA in a manner 
that is functionally equivalent to the 82750PA. 


The following description applies to the 82750PB in- 
struction format. Exact definitions of 82750PB in- 
struction formats and field codings are shown in Fig- 
ure 4-2 and Table 4-5. 


Instruction Sequencing 


The instruction word for 82750PB’s microcode proc- 
essor is 48 bits wide. The Microcode RAM holds 512 
instructions. Nine bits of each instruction specify the 
address of the next instruction to be executed. Each 
instruction fetch reads two instructions (of odd ad- 
‘dress and even address pair) using the upper eight 
bits of the 9-bit instruction address. Both the LSB of 
the instruction address and a Condition Flag bit, se- 
lected from eight possible branching conditions, are 
used to determine whether the next instruction to be 
_executed is the even address instruction or odd ad- 
dress instruction, according to the logic table shown 
as Table 4- 1. 


Table 4-1. Microcode Next Instruction Selection 


| LsBof Condition Next 
Address |_ Flag State Instruction 


| 0 «| <O (FALSE) EVEN 


1 (TRUE) EVEN 


O(FALSE) | ODD | 


1(TRUE) | _EVEN 


82750PB 


For an unconditional branch, the condition flag 
FALSE (which is always zero) is selected; this caus- 
es the LSB of the address to be passed through to 
select the next instruction: LSB = 0 selects EVEN 
and LSB = 1 selects ODD. This allows uncondition- 
al branching to any of the 512 instructions in the 
RAM. For a conditional branch, the LSB of the ad- 
dress is set to a one; this causes the state of the 

condition flag to select the next instruction: FALSE 
selects the ODD instruction and TRUE selects the 
EVEN instruction. Therefore, a conditional branch 
jumps to either the odd or even instruction of an 
odd/even pair depending on the state of the condi- 
tion. 


Instruction Word Field Descriptions 


Each field of the microcode instruction format is de- 
scribed in the following sections. 


NADDR—NEXT INSTRUCTION ADDRESS FIELD 


This field holds the address of the next instruction to 
be executed. Taking advantage of the fact that the 
microcode RAM is physically organized as 256 deep 
by 96 wide (two instructions are fetched per read 
cycle), a. zero delay two-way branch can be 
achieved. The only case in which this field is not 


_used to determine the address of the next instruc- 


tion to be executed is when an instruction writes to 


the PC. (The term PC refers to the register that holds 


the address of the next instruction to be executed.) 
When an instruction loads the PC a one instruction 
delay occurs before the load takes effect. Therefore, 
the instruction pointed to by the next instruction field 
of the instruction that loads the PC is executed be- 


fore the jump to the new address occurs. This is 
~ shown in Table 4-2. 


There are no restrictions on the instruction following 
a PC load; it will always be executed, even while 
single stepping the processor or if the processor is 
frozen on that instruction. 


CFSEL—CONDITION FLAG SELECT FIELD 


This field selects which condition flag will be used 
with the LSB of NADDR to select the next instruction 
from the odd/even pair. The condition flag assign- 
ment is given in Table 4-3. 
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Table 4-2. PC Load Example 


EN Riad BoE 


| Addr | Instruction | NADDR 
Load PC with zero. 
55 rO = 1 Xx This instruction is executed but its next 
address field is ignored. 


PC load takes effect after a one instructon delay, 
the result is that r1 


= 10 = 1. 


Table 4-3. Condition Flag Select Field Assignments 


NOTE: 


- Description 


Select for Unconditional Branch 
Carry Out from ALU Condition Flag Latch 


The ALU condition flags (CARRY, OvF, SIGN, and ZERO) are latched in the ALU Condition Flag register. This register is 
updated for most—but not all—ALU operations. The remaining flags (LCNTZ, LSB, and MSB) are updated and latched each 


cycle. 


ASRC—A BUS SOURCE SELECT FIELD 


This field selects the element that should drive its 
data onto the A bus during the execution of this in- 
struction. The mapping for this and the following 
three fields is provided in Chapter 6. 

ADST—A BUS DESTINATION SELECT FIELD 


This field selects which element should latch data 
from the A bus during the execution of this instruc- 
tion. See ASRC above. 

BSRC—B BUS SOURCE SELECT FIELD 


Same as ASRC, but for B bus. See ASRC above. 


BDST—B BUS DESTINATION SELECT FIELD 
Same as ADST, but for B bus. See ADST above. 


CNT—DECREMENT LOOP COUNTER BIT 


A one in this bit position causes the selected Loop 
Counter (selected by LC, the loop counter select bit) 
to be decremented. The new value of the loop coun- 
‘ter and the updated LCNTZ condition flag are not 
ready until the next instruction cycle. Therefore, ina 
loop where the loop counter is decremented and 
tested for zero in the same instruction (typically in a 
one instruction loop), the start value for the loop 
counter should be one less than the number of times 
the loop should be executed. 


LIT—LITERAL SELECT BIT 


When this bit is a one, the ASRC and CFSEL fields 
are replaced with a 9-bit literal value that is driven as 
a source in the least significant 9 bits of the A bus. In 
this case, the upper 7 bits of the A bus are forced to 
zeros. The mapping of bits from the literal field to the 
A bus is shown in Figure 4-1. 


NOTE 


A conditional branch and a literal on the A bus are 
not allowed in the same instruction. A 3-bit literal 
can be placed on the B bus in any instruction. 
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A bus bits — 15 13 12 ‘11 10° 


Inst. Word Bits 
ASRC Field 
CFSEL Field 


<— 


9 


Forced to Zero ---- —> 
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Figure 4-1. Literal Fleld Mapping onto a Bus 


SHFT—SHIFT CONTROL FIELD 


This field controls the bit shifting and byte swapping 
logic associated with register rO. The encoding of 
this field is given in Table 4-4. : 

Table 4-4, aes Control Field Coding 


SHFT 


_ Shift rO Right One Bit 
- Position, Sign Extend 


10 Shift ro Left One Bit 


Position, Zero Fill 


*Byte swapping only works when r0 is the destination on the 
A bus or the B bus. It does not swap data held in r0, only data 
being loaded. In order to byte swap data in register rO, r0 
must be both a source and destination for either the Aor B 
bus. 


ALUSS—ALU SOURCE SELECT BITS” 


These two bits are used as enables for the two ALU 
input latches. Bit 39 enables the latch that connects 
to the. A bus; bit 38 enables the latch connected to 
the B bus. A one in either bit position causes the 
corresponding input latch to latch the value on the 
bus to which it is connected (the A or B bus). A zero 


| SHFT [| Operation 
No Shift or Swap Operation | - 
| 


11 | ByteSwapthe Value ss fs 
aa - Being Loaded into r0* 


_. ALU operations and therefore, do not latch the ALU 


on either bit causes the corresponding latch to hold 
its current content. This allows the ALU operands 
either to come from “eavesdropping” on the A or B 
bus transfers occurring in the current instruction cy- 
cle or to be held for multiple instruction cycles in 
either the A or B input latch. 


ALUOP—ALU OPERATION CODE FIELD 


This field specifies the ALU instruction to be. per- 
formed during the current instruction cycle. The en- 
coding of this field is given in Figure 4-2. Normally, at 
the end of the instruction execution, the result of the 
ALU operation is latched in the ALU output latch that 
can be a source on either the A or B buses. Howev- 
er, if a NOP is selected for the ALU operation, the 
ALU output latch is not latched. The data is held 


from the previous instruction. In addition to NOP, 


certain other ALU opcodes do not actually perform 


results. They are INT (microcode interrupt) and he 
PROF instruction. 


LC—LOOP COUNTER SELECT BIT 


This bit selects which of the two loop counters is to 
be used for decrementing or. Loop-Counter-Zero 
conditional branching in the current instruction. A 
zero selects loop counter zero and a one selects 
20P counter one. 


Refer to the Intel 82750PB Microcode Pioarining 
Guide for more information on microcode programming. 
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Table 4-5. 82750PB Source/Destination Coding 


post | SRC | ADST 
Nuh 


Address (Hex) 


hwid 


saree | treme fe 


Icnt 


lent 


ee a ae 


| r 

r 

r 

r 

68 
| r 
0x10 | 


mcode3 mcode3 


cc . 
nt cnt 
rO 
a ae 2 (2 
a ee 3 3 
ae ee ee ee 5 
: 
7 17 
poeta | rto stat | moot 
es 


r10 


rO 

rt 
r2 ' 
r3 

14 

r6 

r7 

*int 

* 


ee 


0x1 
0x2 
0x3 
0x4 
0x9 
OxA 
OxC 
OxF 


ri 
r3 
r4 - 
r6 
r7 
r8 
ro 
r12 
r14 
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Table 4-5. 82750PB Source/Destination Coding (Continued) 


[Addressttie) | post | esne | aDsT_—__ 
a ee 
Se 


<a Oe 
[owe ation 


ee 
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bit 
coding 


82750PB 


47 46 45 44 43 42 = 41 40 39° 38 °&37 36 35 94 33 £32; 31 30 29 8628 27 26 25 24 


15 13 #12 #11 10 9 8 7 6 

LC SHFT ALU ALU 
(eC NRA IE 
Pane 


5 4 
NA ae 


= SO A ee ee eee 


3 2 1 0 15 14 8 


: 12 11 10 9 
B Bus B Bus 
Destination - , Source - 


| Oxi | enta | shit | ZERO tat tt dec Pt 


Ox7 


OxC 


< 


- 
* 


OxF 


x< 


0x17 


OxtA 


Ox1C 


Ox1F 


0x27 


Ox2A 


0x2C 


Ox2F 


- Ox37 


0x3C 


Ox3F 


ee es 
ae 


at+. 
b+ + 


I 
+ 
+ + ! +14 1 
—IA as K A + 


7 
ara rem = = 


a 


N 


7 


_ 
= 
q@ 


literalO 
literal 1 . 
literal 2 

-- fiteral3 
literal 4 | 
literal 5 


*dramt:. 

- *dram2 = - 
*drami + + 

' *dram2 + + es 


literal 6 
literal 7 

*outt Pe ne POP 

outt + + ae eee 

to tout tatto 
Te se ree 
a OME ee ON Oa ly 
in i int hi 
| ind — hi 
: Ee 
[Se SI ee ee Oe 
el, 
Sk 
| yodd — To ee A a 
____yedd = ti (eee aera) 
ypitch ee 
eo oe ne OO 
Ta SC*dSC (SCG 
PS vupitch Yam HC 
LT 
[er CdYSCC“‘(CSCSCQm— 


Figure 4-2. 82750PB Instruction Word Format 


o— 
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23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
7 6 5 4 3 2 $1 0 15 14 13 12 11 10 9 8 7 6 5 4 3:2 1. O 
. A Bus A Bus Cond Fiag Next 
bit Destination . Source Select Address 
pa a Be a ae We ee es ed ee ee ee ee 
| 0x0 ae eee FALSE | 
| Ox1 | CARRY 
Poe | CVERFLOW 
| Ox3 | Ree ae) SIGN 
ee | eee ZERO 
| 0x5 | cnt CNTO 
| 0x6 | LSB r0 
lent | SB 
OxA 
0xC 


[>] 


oO 
sat 
oO Ni nl Q > alo a0 


“J 


x 
mcode3 
mcode2 
mcode1 


mcode3 
mcode2 
mcode1 
0x13. 
Ox1 


*dram1 *dram1 


~~ 


*drami + + 
~  Sdram2 + + 
*drami — — 
*dram2 — — 


*drami + + 
*dram2 + + 
Ox1A 
*dram2 — — 
Ox1 


@) 


Ox1E 


Ox1F dram4 dram4 


outt + + 


out2 + + 
shift — r 


0x27 out2 — hi 


0x2 shift — | 


Ox2 


’ 0x2 


~— 
—_ 
—_ 


“a 
_ 


0x3 


0x3 


> 


*dram4 
*dram3 + + 
*dram4 + + . . \ 
*dram3 — — . 
*dram4 - — 


*dram3 + + 
*dram4 + + 


0x3 


0x3 *dram4- -— 


n 


Figure 4-2. 82750PB Instruction Word Format (Continued) 
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5.0 ELECTRICAL DATA | _.... Exposure to Maximum Ratings may affect device re- 
: mg liability. Furthermore, although the 82750PB con- 


Maximum Ratings , 2% _ , tains protective circuitry to resist damage from static 

3 aa : Pats _ @lectrical discharge, always take precautions to 
7 | ep | _ avoid high static voltages or electric fields. 

Table 5-1 is a stress rating only, and functional operation . , | 

at the maximums is not guaranteed. Functional operat- 

ing conditions are given in the DC and AC Characteris- 

tics (Tables 5-2, 5-3, 5-4, and 5-5). | 


DC Characteristics 


Table 5-1. Absolute Maximum Requirements 


~Maximum Requirement “y 
—65°C to 110°C 
= 35°C to 150°C 
~ = 0.5V to Vog + 0.5V > 

0.5V 10+ 6.5V 


Condition 


| 


Table 5-2. DC Characteristics Vog =°oVe - . = 0°C to 90°C 8 
co 


low = — 1.0 mA” 


| Ves<Min<Voo 


NOTES: 

1. Measured with CLKIN = 8 MHz. | | 

2. Typical current value measured under typical conditions. Maximum current value guaranteed with 50 pF maximum output 
loading. ! | | a 
3. Not 100% tested. 
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AC Characteristics | | 
Table 5-3. AC Characteristics at 25 MHz’ Vo = 5V £10%, Tenge = O°C to+90°C,C, = 50 pF 


Symbol 


_CLKIN Low Time 
CLKIN Fall Time 
CLKIN Rise Time 


A(31:2], BE # [3:0], WE #, 
D[381:0], HINT #, PMFRZ # 
Valid Delay 


MREQ #, TRNFR #, RFSH #, © 
NXTFST #, HBUSEN #, 
HRDY #, Valid Delay 


A(31:2], BE # {3:0], WE#, 
D[31:0] Float Delay 


MRDY # Setup 
MRDY # Hold 


t 
t, 
—_ 
ty 
t 


A(8:2], BE #[3:0],WE#, = 4 ns 5-3 
D[31:0] Setup —% 

A(8:2], BE # [3:0], WE #, 

D[31:0] Hold | 


[AEG # WAM Seu [70] 
a 
8 


- ~~ -- _ -- 


t 
t 
t 


HREG #, HRAM # Hold Ee 
CLKOUT Valid Delay ey 18 
ect eeal WEL 1/2t, +6 
NOTES: 


1. This assumes 40 ns period. For other speeds these values should fall between 40% to 60% duty cycle. 
2. Not 100% tested. Guaranteed by design characterization. 

3. Inputs must remain valid throughout all cycles of host accesses. See Figures 3-6 through 3-8. 

4. All A.C. specifications are measured at the 1.5V crossing point with a 50 pF load. a4 


13 
14 . 
15 
big 
bi7 


CLKOUT High Time 
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\ : t 
Sis le 
SSS tt 


240854- 19 


Figure 5-1. Clock Waveforms 


240854-20 


NOTES: — | | - : | | SE cas — 240854-21 
ty = (8, t10, t12, t14 (setup times) . | ran 
tz = 19, t11, 13, t15 (hold times) 


CLKOUT 


- 240854-14 | 


Figure 5-4. CLKOUT Waveforms 
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Output Delay and Rise Time Versus Load Capacitance 


240854-22 


NOTE: 


This graph will not be linear outside of the 


Rise 
Time (ns) 
0.8V-2.0V 


C, (picofarads) 
240854-23 


NOTE: | 
This graph will not be linear outside of t 


Figure 5-6. Typical Output Rise Time Versus Load Capacitance under Worst Case Conditions 
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6.0 MECHANICAL DATA 


Packaging Outlines and Dimensions 
Intel packages the 82750PB in a Plastic Quad Flat Pack (PQFP). Table 6-1 gives the ca list for the PQFP. 


Table 6-1. PQFP Symbol List 


a a aa Packane Height Distance from Seating Plane to Highest Point of Body 
[a ___|__ Standort Distance trom Seating Plane toBase Plane 

ane eo" 

a 


The PQFP has the following specifications: | 
1. All dimensions and tolerances conform to ANSI Y14.5M- 1982. 
2. Datum plane —H— is located at the mold parting line and coincident with the bottom of the lead where lead 
exits plastic body. 
3. Datums A-B and —D— are to be determined where center ade exit plastic pogy. at datum plane aie 
4. Controlling dimension is the inch. a | 


5. Dimensions Dj, Do, E;, and Es are measured at the mold parting line and do not include mold protrusion. 
Allowable mold protrusion is 0.18 mm (0.007 in.) per side. | 


6. Pin 1 identifier is located within one of the two zones indicated. 
7. Measured at datum plane —H—. 
8. Measured at seating plane datum —C—. 
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Table 6-2 provides outline characteristics for 0.025 in. pitch. 


Table 6-2. intel Case Outline Drawings for PQFP at 0.025 inch Pitch 


[—symbot | ——eseripton | in| 
TN | teadourt ite dt 


11 @ [8.28 ¢.888)H [c]A®-8© [0®| 


/N\-H-] --BASE PLANE 
| ol eo A] 


AAV GREE DEE BEGET DTT 
abiding La et t wh : 


eee oemarceeyererrers 


[@ [8.20 (.008) © [<[A@-8O[0O 


— 
— 
— 
—e 
—m, 
- 
— 
— 
—— 
vee 
— 
— 
- - : 
— 
ney 
—— 
— 
ens 
— 
— 
a 
L 


> 


-C-ISEATING PLANE 
}Cd{8.18 (.884). 


240854-24 


Figure 6-1. Principal Dimensions of the 82750PB in the 132-Lead PQFP Package 
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8.25 ¢. TOC) TNOTO} 16 A 
aa 882 MM/MM CIN/IN) 1E=)) 


| @ 18.25 ¢. a18) @IC[A@-BO 0O 1A 
| | | .@82 MM/MM CIN/IN) 


gS. oe (.158) ‘MAX TYP 


mM -SEE DETAIL M 


1.91 (.875) MAX TYP 


[@ [8.25 ¢.818)@[c[a@-8© [DO] 
Laka] 882 MAZMM CINZIND ID] 


(@ [8.25 (810) @[c[A®-86 [0® A 
|_| | .@82 MM/MM (IN/IN) or. 


eaUbatn es 


mm (inch) 


Figure 6-2. Detailed pinenons of the 82750PB i in the 132-Lead PGFP—Molding Details 


(a) 


2 TS |Z 
CO | Ml Hg) 


SEE DETAIL L 


SEE DETAIL J 


240854-26 


mm (inch) 


Figure 6-3. Detailed Dimensions of the 82750PB in the 132-Lead PQFP—Terminal Details 
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1.32 (.d52) 
1.22 (.848) 


6.96 (.955) MIN. 


—— 


1.52 ¢.052) 
1.22 (.848) 


8.99 ¢.035) MIN. cote 080) 


1.93 ¢€.876) 
2.93 ¢(.089) 
1.93 ¢€.976) 


240854-27 


mm (inch) 


Figure 6-4. 132-Lead PQFP Mechanical Package Detail—Protective Bumper 


| @ [8.13 (885) |C]A@-8@ [COVA 


6.41 ¢.816) 
4.28 ¢.@ga) 


8.31 (.812) =| f= 
0.28 (.998) 


DETAIL J DETAIL L 


. 240854-28 
mm (inch) 


Figure 6-5. 132-Lead PQFP Mechanical Package Detail—Typical Lead 
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PPR PR PPP 


82750PB 


ALL DIMENSIONS AND TOLERANCES CONFORM TO ANSI Y14.5M-1982 


DATUM PLANE CEHJ LOCATED AT THE MOLD PARTING LINE AND 
COINCIDENT WITH THE BOTTOM OF THE LEAD WHERE LEAD EXITS PLASTIC 800Y 


oaTums (A=B} ANDO G09 TO BE DETERMINED WHERE CENTER LEADS EXIT 
PLASTIC BODY AT DATUM PLANE £83 , | 


CONTROLLING DIMENSION, INCH 


DIMENSIONS D1, 02, El AND £2 ARE MEASURED AT THE MOLD PARTING LINE. 


O01 AND El OO NOT INCLUDE AN ALLOWABLE MOLD PROTRUSION OF 8.18 MM 


(087 IM) PER SIDE. 02 AND E2 00 NOT INCLUDE A TOTAL ALLOWABLE 


MOLD PROTRUSION OF 6.18 MM (.887 IN) AT MAXIMUM PACKAGE SIZE. 


PIN 1 IDENTIFIER 1S LOCATED 8ITHIN ONE OF THE TWO ZONES INDICATED 


MEASURED AT DATUM PLANE EHQ 


MEASURED AT SEATING PLANE DATLA ETD: 
240854-29 
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Package Thermal Specifications 


The 82750PB is specified for operation when Tc 
(the case temperature) is within the range of 0°C to 
90°C. TG may be measured in any environment to 
determine whether the 82750PB is within specified 
operation range. The case temperature should be 
measured at the center of the top surface. 


Ta (the ambient temperature) can be calculated 
from Oca (thermal resistance from case to ambient) 
with the following equation: 


TA = iG =P * 067% 


Typical values for 8c, at various airflows are given 
in Table 6-3 for the 132-lead PQFP package. Table 
6-4 shows the maximum T, allowable (wihout ex- 
ceeding Tc) at various airflows. The power dissipa- 
tion (P) is calculated by using the typical supply cur- 
rent at 5V as shown in Table 5-2. 


Table 6-3. Thermal Resistance (°C/W) 


0 
-  132-Lead 
PQFP 


Oca Versus Airflow—ft/min (m/sec) 7 
paakaae: 200 — 400 600 800 1000 
g * (0) ~ (1.01) (2.03) (3.04) (4.06) (5.07) 


Table 6-4. Maximum Tp at Various Airflows (°C) 


Frequency 0 

Pacnage |__(ittiz) 
132-Lead 

| PQFP 


| Ta Versus Airflow—ft/min (m/sec) 


400 600 800 1000. 
(2.03) (3.04) (4.06) (5.07) 
pm fom [ow | om 


1-119 


i860™ Microprocessor Family @ 
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: PRELIMINARY 
intel. 
i860™ XP MICROPROCESSOR 


mw Parallel Architecture that Supports Up 
to Three Operations per Clock 
— One Integer or Control Instruction 
— Up to Two Floating-Point Results 


@ High Performance Design . 
— 40/50 MHz Clock Rate 
— 100 Peak Single Precision NFLOPS 
— 75 Peak Double Precision MFLOPS 
_ -——64-Bit External Data Bus _ 
— 64-Bit Internal Code Bus | 
— 128-Bit Internal Data Bus 


m High Integration on One Chip 

— 32-Bit Integer and Control Unit 

— 32/64-Bit Pipelined Floating-Point 

— 64-Bit 3-D Graphics Unit 

— Paging Unit with 64 Four-Kbyte and 
16 Four-Nibyte Pages 

— 16 Kbyte Code Cache 

— 16 Kbyte Data Cache 


Fast, Multiprocessor-Oriented Bus 
— Burst Cycles Move 400 Nibyte/Sec 
— Hardware Cache Snooping 
— MESI Cache Consistency Protocol 
— Supports Second-Level Cache 
— Supports DRAM 


a Compatible with Industry Standards 

— ANSI/IEEE Standard 754-1985 for 
Binary Floating-Point Arithmetic 

— Intel 386™/Intel 486™/i860™ Data 
Formats and Page Table Entries | 

— Binary Compatible with i860T™ XR 
Applications Instruction Set 

— Detached Concurrency Control Unit. 
(CCU) Supports Parallel Architecture 
Extensions (PAX) 

— JEDEC 262-pin Ceramic Pin Grid 
Array Package 

— IEEE Standard 1149.1/D6 Boundary- 
Scan Architecture 


Easy to Use 

— On-Chip Debug Register 

— UNIX*/860 

— APX Attached Processor Executive 

— Assembler, Linker, Simulator, 
Debugger, C and FORTRAN 
Compilers, FORTRAN Vectorizer, 
Scalar and Vector Math Libraries 

— Graphics Libraries 


The Intel i860 XP Microprocessor (order code A80860XP) delivers supercomputing sertonnente in a single 
VLSI component. The 32/64-bit architecture of the i860 XP microprocessor balances integer, floating point, 
and graphics performance for applications such as engineering workstations, scientific computing, 3-D graph- 
ics workstations, and multiuser systems. Its parallel architecture achieves high throughput with RISC design 
techniques, multiprocessor.support, pipelined processing units, wide data paths, large on-chip caches, 2.5 | 
million transistor design, and fast 0.8-micron silicon technology. . 


A31-A3  D63- D0 ONTRE 


PAGING UNIT 


Figure 0.1. Block Diagram 


| RISC CORE Loy secre File 
. : 


re Deel anche rae 


| FP REGISTER rue 
DATA CACHE. a . | "| FP MULTIPLIER UNIT 
: Pena? : 


| | FP ADDER UNIT 
GRAPHICS UNIT : 


. 240874-1 


*UNIX is a registered trademark of UNIX System Laboratories, Inc. 
intel, i860, Intel386 and Intel486 are trademarks of Intel Corporation. 
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1.0 FUNCTIONAL DESCRIPTION 


As shown by the block diagram on the front page, 
the i860 XP Microprocessor consists of the following 
units: 


1. Integer Registers and Core Execution Unit 
2. Floating-Point Registers and Control Unit 
3. Floating-Point Adder Unit 

4. Floating-Point Multiplier Unit 

5. Graphics Unit 

6. Paging Unit 

7. Instruction Cache 

8. Data Cache 

9. Bus and Cache Control Unit 

10. Detached Concurrency Control Unit 


The core execution unit controls overall operation of 
the i860 XP microprocessor. It executes load, store, 
integer, bit, |1/O, and control-transfer operations, and 
fetches instructions for the floating-point unit as well. 
A set of 32 X< 32-bit general-purpose registers. are 
provided for the manipulation of integer data. Load 
and store instructions move 8-, 16-, and 32-bit data 
to and from these registers. Its full set of integer, 
logical, and control-transfer instructions give the 
core unit the ability to execute complete systems 
- software and applications programs. A trap mecha- 
nism provides rapid response to exceptions and ex- 
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The floating-point multiplier performs floating-point 
and integer multiply as well as floating-point recipro-. 
cal operations on 64- and 32-bit floating-point val- 
ues. A multiplier instruction executes in three to four 


clocks; however, in pipelined mode, a new result can 


be generated every clock for single-precision and 
every other clock for double precision. 


The graphics unit supports three-dimensional draw- 
ing in a graphics frame buffer, with color intensity 
shading and hidden surface elimination via the 
Z-buffer algorithm. The graphics unit recognizes the 
pixel as an 8-, 16-, or 32-bit integer data type. It can 
compute individual red, blue, and green color inten- 
sity values within a pixel; but it does so with parallel 
operations that take advantage of the 64-bit internal 


word size and 64-bit external bus. The graphics fea- gum 
tures of the i860 XP microprocessor assume that the 


surface of a solid object is drawn with palygon 
patches which, like the pieces of a puzzle, collec- 
tively approximate the shape of the original object. 
The color intensities of the vertices of the polygon 
and their distances from the viewer are known, but 
the distances and intensities of the other points 
must be calculated by interpolation. The graphics in- 
structions of the i860 XP microprocessor directly aid 
such interpolation. 


The paging unit implements protected, paged, virtual 
memory. The paging unit uses two four-way set-as- 


~ sociative cache memories called TLBs (Translation 


ternal interrupts. Debugging is supported by the abili- 


ty to trap on data or instruction reference. 


The floating-point hardware is connected to a sepa- 
rate set of floating-point registers, which can be ac- 


cessed as 16 X 64-bit registers or as 32 X 32-bit 


registers. Load and store instructions can also ac- 
cess these same registers as 8 < 128-bit registers. 

_ All floating-point and graphics instructions use these 

registers as their source and destination operands. 


The floating-point control unit controls both the float- 
ing-point adder and the floating-point multiplier, issu- 
ing instructions, handling all source and result ex- 
ceptions, and updating status bits in the floating- 
point status register. The adder and multiplier can 
operate in parallel, producing up to two results per 
clock. The floating-point data types, floating-point in- 
structions, and exception handling all support the 


Lookaside Buffers) to perform the translation of logi-. 
cal address to physical address, and to check for 
access violations. The access protection scheme 
employs two levels of privilege: user and supervisor. 
One TLB supports 4 Kbyte pages, and has 64 en- 
tries; the other supports 4 Mbyte pages, and has ue 
entries. 


The instruction cache is a four-way set-associative 
memory of 16 Kbytes, with 32-byte lines. It transfers 
up to 64 bits per clock (400 Mbyte/sec at 50 MHz). 


. The data cache is a four-way set-associative memo- 


IEEE Standard for Binary Floating-Point Arithmetic 


(ANSI/IEEE Std 754-1985). 


The floating-point adder performs addition, subtrac- 


tion, comparison, and conversions on 64- and 32-bit — 


floating- point values. An adder instruction executes 
in three clocks; however, in pipelined mode, a new 
result is generalise every clock. 


2-9 


ry of 16 Kbytes, with 32-byte lines. It transfers up to 
128 bits per clock (800 Mbyte/sec at 50 MHz). The 
i860 XP microprocessor normally uses write-back 
caching, i.e. memory writes update the cache (if ap- 
plicable) without necessarily updating memory im- 
mediately; however, under both software and hard- 
ware control, write-through and write-once policies 
can be implemented, or caching can be inhibited. 
The caches are manepate to applications soft- 
ware. 


The bus and cache control unit performs data and 
instruction accesses for the core unit. It receives cy- 
cle requests and specifications from the core unit, 
performs the data-cache or instruction-cache miss 
processing, controls TLB translation, and provides 


intel. 


the interface to the external bus. Its pipelined struc- 
ture supports up to three outstanding bus cycles. Its 
burst mode transfers data at up to 400 Mbyte/sec at 
50 MHz. In multiprocessor systems, it maintains 
cache consistency by monitoring bus activity in par- 
allel with other CPU functions. 


The DCCU (detached concurrency control unit) is a 
compatible subset of the external CCU that expe- 
dites loop-level parallelism and synchronization. in 
multiprocessor systems. The DCCU consists of reg- 
isters and a counter that allow a single i860 XP mi- 
croprocessor to run binary code compiled for a mul- 
tiprocessor system adhering to the PAX parallel ap- 
plications binary interface (ABI). 


The i860 XP microprocessor may to be used with or 
~ without an external, secondary cache built from 
82495XP and 82490XP cache components. An 
82495XP and 82490XP cache provides up to 512 


Wed at et 8 ws i ee 


Kbytes of high-speed storage for data and instruc- 
tion combined. In most cases, an 82495XP and 
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2.1.1 INTEGER | 
An integer is a 32-bit signed value in standard two’s | 
complement form. A 32-bit integer can represent a 
value in the range —2,147,483,648 (—231) to 


— 2,147,483,647 (+.231 — 1): Arithmetic operations on — 


8- and 16-bit integers can be performed by sign-ex- 
tending the 8- or 16-bit values to 32 bits, then using 


~ the 32-bit operations. 


82490XP cache can provide data to the CPU with | 


zero wait states. The larger size of an external cache 
can provide an increased hit rate when the size or 
number of data structures and programs exceeds 
the size of the internal caches. In multiprocessor 
systems, the external cache serves as local memo- 
ry, and can reduce bus traffic. An external cache 
also hides the processor from rest of a which 
is a double advantage: 


1. The processor can be upgraded without affecting 
design of the memory and other subsystems. 


Slower and less expensive memory and I/O sub- 
system designs can be employed without unduly 
lowering overall system performance. 


2. 


Refer to the 82495XP Cache Controller/82490XP 
Cache RAM Data Sheet (Intel Order #240956) for 
more information. 


2.0 PROGRAMMING INTERFACE 


The programmer-visible aspects of the architecture 
of the i860 XP microprocessor include data pps: 
vee instructions, and traps. ee 


2.1. Data T Types | 


The i860 XP microprocessor provides operations for 
integer and floating-point data. Integer operations 


_ can | 
4,294,967,295 (+ 232 — 1). 


There are also add and subtract instructions that op- 
erate on 64-bit long integers. 


Load and store instructions may also reference (in 
addition to the 32- and 64-bit formats previously 
mentioned) 8- and 16-bit items in memory. When an 
8- or 16-bit item is loaded into a register, it is con- 
verted to an integer by sign-extending the value to 
32 bits. When an 8- or 16-bit item is stored from a 
register, the corresponding number of low-order bits 
of the register are used. 


2.1.2, ORDINAL 
Arithmetic operations are available for 32-bit ordi- 


nals. An ordinal is an unsigned integer. An ordinal 
represent values in the range 0. to 


Also, there are add and subtract instructions that op- 


erate on 64-bit ordinals. 


2.1.3 SINGLE- AND DOUBLE-PRECISION REAL 


Figure 2.1 shows the real number formats. A single- 
precision real (also called “single real’) data type is 


a 32-bit binary floating-point number. Bit 31 is the 


sign bit; bits 30..23 are the exponent; and bits 22..0 


3. 


are performed on 32-bit operands with some support 


also for 64-bit operands. Load and store instructions 
* can reference 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit 
operands. Floating-point operations are performed 
on IEEE-standard 32- and 64-bit formats. Graphics 
instructions operate on arrays of 8-, 16-, or 32-bit 
pixels. oe 


are the fraction. In accordance with ANSI/IEEE 
standard 754, the value of a single- “precision real is 
defined:as follows: 


1. Ife = Oandf # Oore = 255 then sapeeie a 
floating-point source-exception trap when en- 
countered in a floating-point operation. 


2.1f0<es 255, then the value is 0 xX 1.4 X 
. 98-127. 
Ife = 0 and f= 0, then the value is signed zero. 


A double-precision real (also called “double real’) 
data type is a 64-bit binary floating-point number. Bit 
63 is the sign bit; bits 62..52 are the exponent; and 
bits 51..0 are the fraction. In accordance with ANSI/ 
IEEE standard 754, the value of a double-precision 


real is defined as follows: 
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1. Ife = Oandf # 0 ore = 2047, then generate a 
floating-point source-exception trap when en- 


countered in a floating-point operation. 
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2. IfO < e < 2047, then the value is (—1)S X 1.f x 
ge- 1023, 


3. Ife = 0 andf = 0, then the value is signed zero. 


The special values infinity, NaN (‘Not a Number’), 
indefinite, and denormal generate a trap when en- 
countered. The trap handler implements |EEE-stan- 
dard results. 


A double real value occupies an even/odd pair of 
floating-point registers. Bits 31..0 are stored in the 
even-numbered floating-point register; bits 63..32 
are stored in the next higher odd-numbered floating- 
point register. 


2.1.4 PIXEL 


A pixel may be 8-, 16-, or 32-bits long, depending on 
color and intensity resolution requirements. Regard- 


Single-Precision Real 


SIGN 
EXPONENT 
FRACTION 


240874-2 
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less of the pixel size, the i860 XP microprocessor 

always operates on 64 bits of pixel data at a time. 

The pixel data type is used by two kinds of instruc- 

tions: . 

° The selective pixel-store instruction that helps im- 
plement hidden surface elimination. 


© The pixel add instruction that helps implement 
3-D color intensity shading. 


To perform color intensity shading efficiently in a va- 
riety of applications, the i860 XP microprocessor de- 
fines three pixel formats according to Table 2.1. 


Figure 2.2 illustrates one way of assigning meaning ~ 
to the fields of pixels. These assignments are for 
illustration purposes only. The i860 XP microproces- 
sor defines only the field sizes, not the specific use 
of each field. Other ways of using the fields of pixels 
are possible. 


Double-Precision Real 
i 


Jc a ee 


FRACTION 

EXPONENT 

SIGN 
240874-3 


Figure 2.1. Real Number Formats 


_~Pixel Bits of 
size Color 1 
~ (in bits) Intensity() 


Table 2.1. Pixel Formats 


Bits of 
Color 2 
Intensity 


Bits of 
Other 
Attribute 
(Texture, Color) 


Bits of 
Color 3 
Intensity() 


N (<8) bits of intensity(2) 


NOTES: 


1. The intensity attribute fields may be assigned to colors in any order convenient to the application. 
2. With 8-bit pixels, up to 8 bits can be used for intensity; the remaining bits can be used for any other attribute, such as 
color or texture. Bits that require interpolation (shading), such as those for intensity, must be the low-order bits of the pixel. 
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8-BIT PIXEL 


15 Id 13 12 HUOS9 8 7 6 5S ALS 2 I Of 


16-BIT PIXEL 


32~BIT PIXEL 


GREEN 


RED GREEN BLUE 


BLUE TEXTURE 


JI 50 29 28 27 26 25 24/2353 22 21 2019 18 17 16415 14 13 12 1110-9 B47 6 5 4 5 2 Ff 


‘NOTE: 


not the specific use of each field. 
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These assignments of specific meanings to me fields of Pel are for illustration only. Only the field sizes are defined, 


Figure 2.2. Pixel Format Example 


2.2 Register Set 

As Figure 2.3 shows, the i860 XP microprocessor 
has the following registers: 

e An integer register file 

e A floating-point register file 


When accessing 64-bit floating-point or integer val- 
ues, the i860 XP microprocessor uses an even/odd 


pair of registers. When accessing 128-bit values, it 


© Control registers psr, epsr, db, dirbase, fir, fsr, 


bear, ccr, p3, p2, p1, pd 


-® Special-purpose registers KR, KI, T, MERGE, 
STAT, and NEWCURR ~ , 
The control registers:are accessible only by load 
and store control-register instructions; the integer 
and floating-point registers are accessed by arithme- 
tic operations and load and store instructions. The 
special-purpose registers KR, Kl, and T are used by 
floating-point instructions; MERGE is used by graph- 


ics instructions. NEWCURR and STAT are used for | 


concurrency control; they are accessed by memory 
load and store instructions. 


2.2.1 INTEGER REGISTER FILE 


There are 32 integer registers, each 32 bits wide, 
referred to as r0_ through r31, which are used for 


uses an aligned set of four registers (f0, £4, 8, £12, 
£16, 20, £24, or £28). The instruction must designate 
the lowest register number of the set of registers 
containing 64- or 128-bit values. Misaligned register 
numbers produce undefined results. The register 
with the lowest number contains the least significant 
part of the value. For 128-bit values, the register pair 
with the lower number contains the value from the 
lower memory address; the register pair with the 


_ higher number contains the value from the higher 


address computation and scalar integer computa- — 


tions. Register r0 always returns zero when read. 


2.2.2 FLOATING-POINT REGISTER FILE 


There are 32 floating-point registers, each 32-bits 
wide, referred to as f0 through f31, which are used 
for floating-point computations. Registers f0 and f1 
always return zero when read. The floating-point 
registers are also used by a set of integer opera- 
tions, primarily for graphics computations. 


address. 


The 128-bit load and store instructions, along with 
the 128-bit data path between the floating-point reg- 
isters and the data cache, help to sustain an extraor- 


| dinarily high rate of computation. 


2.2.3 PROCESSOR STATUS REGISTER 


The processor status register (psr) contains miscel- 
laneous state information for the current process. 
Figure 2.4 shows the format of the psr. 


e BR (Break Read) and BW (Break Write) enable a 
data access trap when the operand address 
matches the address in the db register and a 
read or write (respectively) occurs. 


e Various instructions set CC (Condition Code) ac- 
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cording to tests they perform. The branch-on- 
condition-code instructions test its value. The bla 
instruction sets and tests LCC (Loop Condition 
Code). 


IM (Interrupt Mode), if set, enables external inter- 
rupts on the INT pin; disables interrupts on.INT if 
clear. IM does not affect parity error interrupts or | 
_ interrupts on the BERR pin. 
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Figure 2.3. Registers and Data Paths 


e U (User Mode) is set when the i860 XP micro- 


processor is executing in user mode; it is clear 
when the i860 XP microprocessor is executing in 
supervisor mode. In user mode, writes to some 
control registers are inhibited. This bit also con- 
trols the memory protection mechanism. 


PIM (Previous Interrupt Mode) and PU (Previous 
User Mode) save the corresponding status bits 
(IM and U) on a trap, because those status bits 


~ are changed when a trap occurs. They are re- 


stored into their corresponding status bits when 
returning from a trap handler with a branch indi- 
rect instruction when a trap flag is set in the psr. 
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© FT (Floating-Point Trap), DAT (Data Access 
_ Trap), IAT (Instruction Access Trap), IN (Inter- 


rupt), and IT (Instruction Trap) are trap flags. 
They are set when the corresponding trap condi- 
tion occurs. IN is set on INT, bus error and parity 
error. The trap handler examines these bits (and 
other trap bits in the epsr) to determine which 
condition or conditions have caused the trap. 


DS (Delayed Switch) is set if a trap occurs during 
the instruction before dual-instruction mode is en- 
tered or exited. If DS is set and DIM (Dual Instruc- 
tion Mode) is clear, the i860 XP microprocessor. 


_ Switches to dual-instruction mode one instruction 
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Figure 2.4. Processor Status Register 


after ssuraing from the trap handler. lf DS and DIM | 
are both set, the i860 XP microprocessor switches 

to single-instruction mode one instruction after re- 
turning from the trap handler. 


- @ When a trap occurs, the i860 XP microprocessor 7 8 

sets DIM if it is executing in dual-instruction 16. 
mode; it clears DIM if it is executing in single-in- 39 
struction mode. If DIM is set.after returning from a (undefined) 
trap handler, the i860 XP microprocessor re- 
sumes execution in dual-instruction mode. 


e When KNF (kill Next Floating-Point Instruction) is 
set, the next floating-point instruction is sup- © 
pressed (except that its dual-instruction mode bit 
is interpreted). A trap handler sets KNF if the 
trapped floating-point instruction should not be - 
reexecuted. 


© SC (Shift Count) stores the shift count used by 
_ the last right-shift instruction. It controls the num- 


Table 2.2. Values of PS 


(undefined) 


2.2.4 EXTENDED PROCESSOR STATUS 
REGISTER 


The extended processor status register open) con- 

tains additional state information for the current pro- 

cess beyond that stored in the psr. Figure 2.5 shows 
_ the format of the epsr. 


© The processor type is 2 for the i860 XP micro- 


ber of shifts executed by the double- eu instruc- ' processor. 

tion. e The stepping number has a unique value that sis 
e PS (Pixel Size) and PM (Pixel Mask) are used by | tinguishes among different revisions of the proc- 

the pixel-store and other graphics instructions. essor. é 

The values of PS control pixel size as defined by e IL (Interlock) is set if a trap occurs after a lock 


Table 2.2. The bits in PM correspond to pixels to 
be updated by the pixel-store instruction pst.d. 


~The low-order bit of PM corresponds to the low- 


order pixel of the 64-bit source operand of pst.d. 


- The number of low-order bits of PM that are actu- 


ally used is the number of pixels that fit into 
64-bits, which depends upon PS. If a bit of PM is 
set, then pst.d stores the corresponding pixel. 
Refer also to the pst.d instruction in section 10. 
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instruction but before the last BRDY # of the load 
or store following the subsequent unlock 
instruction. IL indicates to the trap handler that a 
locked sequence has been interrupted. When the 
trap handler finds IL set, it should scan back-_ 
wards for the lock instruction and restart at that 

point. The absence of a lock instruction within . 
30-33 instructions of the tee indicates a pro- 


_ gramming error. 


INTERLOCK 
WRITE-PROTECT MODE 
PARITY ERROR FLAG °* 


i860™ XP MICROPROCESSOR 


PRELIMINARY 


Teck He at aces rosace 


SoA SRS Me's 


a Blo E E STEPPING PROCESSOR 
: ial NUMBER PE 


ogee ro ree Met 
Mes OSES 


BUS ERROR FLAG ° 

INTERRUPT 

DATA CACHE SIZE 

PAGE-TABLE BIT MODE 

BIG ENDIAN MODE 

OVERFLOW FLAG 

BEF OR PEF AT SUPERVISOR LEVEL 
TRAP ON DELAYED INSTRUCTION 
TRAP ON AUTOINCREMENT 

TRAP ON PIPELINE USE 
PIPELINE INSTRUCTION 

STRONG ORDERING MODE 


RESERVED BY INTEL CORPORATION 

B3] CAN BE WRITTEN ONLY FROM SUPERVISOR LEVEL 
Ba READ ONLY (NOT WRITABLE BY SOFTWARE) 

[+] RESERVED IN THE 80860XR CPU 


240874-7 


Figure 2.5. Extended Processor Status Register 


© WP (write protect) controls the semantics of the 
_ W bit of page table entries. A clear W bit in either 
the directory or the page table entry causes 
writes to be trapped. When WP is clear, writes 


are trapped in user mode, but not in supervisor — 


mode. When WP is set, writes are trapped in both 
user and supervisor modes. . 


e PEF (parity error flag) is set by the i860 XP micro- 
processor when a parity error trap occurs. As 
soon as PEF is set, further parity error and bus 
error traps are masked. Software must clear PEF 
to reenable such traps. PEF is set at RESET. 


© BEF (bus error flag) is set by the i860 XP micro- 
processor when the BERR pin is asserted, indi- 

- cating a bus error. As soon as BEF is set, further 
parity error and bus error traps are masked. Soft- 
ware must clear BEF to reenable such traps. BEF 
is set at RESET. 


° INT (Interrupt) is the value of the INT input pin. 


e DCS (Data Cache Size) is a read-only field that 
tells the size of the on-chip data cache. The num- 
ber of bytes actually available is 212 + DCS; 
therefore, a value of zero indicates 4-Kbytes, one 


indicates 8 Kbytes, etc. The value of DCS for the | 


i860 XP: microprocessor is two, which indicates 
16 Kbytes. 


e PBM (Page-Table Bit Mode) has no effect in 
the i860 XP microprocessor. PBM is used by the 
i860 XR microprocessor. 


e BE (Big Endian) controls the ordering of bytes 
within a data item in memory. Normally (i.e. when 
_ BE is clear) the i860 XP microprocessor operates 


in little endian mode, in which the addressed byte ~ 
is the low-order byte. When BE is set (big endian. 


2-15 


mode), the low-order three bits of all 32-bit data 
load and store addresses are complemented, 
then masked to the appropriate boundary for 
alignment. This causes the addressed byte to be 


7 the most significant byte. Big endian mode af- 


fects not only the memory load and store instruc- 
tions but also the ile; stio, Idint, and scyc 
instructions. 


OF (Overflow Flag) is set by adds, addu, subs, 
and subu when integer overflow occurs. For 
adds and subs, OF is set if the carry from bit 31 
is different than the carry from bit 30. For addu, 
OF is set if there is a carry from bit 31. For subu, 
OF is set if there is no carry from bit 31. Under all 
other conditions, it is cleared by these instruc- 


_ tions. OF may be changed by arithmetic instruc- 


tions in either user or supervisor mode. It may be 
changed by the st.c instruction in supervisor 
mode only. OF controls the function of the intovr 
instruction. Inside the trap handler, OF may not 
be valid for traps other than one caused by 
intovr. | 


BS (bus or parity error trap in supervisor mode) is 
set by the i860 XP microprocessor when a bus or 
parity error occurs during a supervisor mode 
memory access cycle. This is true even though 


the processor may have switched to user mode 


by the time these errors are reported. The BS bit 
contains valid information only if BERR is assert- 
ed in the same clock as BRDY# or one clock 
after that. In all other conditions the contents of 
the BS bit are undefined. The operating system 
can use this bit to decide, for example, whether 
to abort the process (user mode) or reboot the 
system (Supervisor mode). 
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e DI (trap on delayed instruction) is set by the 
i860 XP microprocessor when a trap occurs ona 
delayed instruction (the instruction located after a 
delayed branch instruction). When DI is set, the 
trap handler must restart the interrupted proce- 


dure from the branch instruction rather than at 


the address in fir. 


e TAI (trap on autoincrement neeuction) is set by — 


the i860 XP microprocessor when a trap occurs 
on an instruction with autoincrement. When TAI is 
set, the trap handler should undo the autoincre- 
ment (that is, restore src2 to its original value). © 


e PT (trap on-pipeline use) indicates to the i860 XP 
microprocessor that a trap should be generated 
and PI should be set when it executes an instruc- 
tion that uses the floating-point or graphics unit. 
Such instructions include all the instructions des- 
ignated “Floating-Point Unit” in Table 2.9, plus 
the pfld instruction. PT is set and cleared only by 
software. It can be used by the trap handler to 


avoid unnecessary saving and restoring of the | 


pipelines (refer to section 2.8). When a trap due 


to PT occurs, the floating-point operation has not» 


started, and the pipelines have not been ad- 
vanced. Such a trap also sets the IT bit of psr. 


e The behavior of PI (pipeline instruction) depends 
on the setting of PT. If PT = 0, the i860 XP mi- 
croprocessor sets PI when any pipelined instruc- 
tion or pfid is executed. If PT = 1, the processor 
sets Pl and traps when it decodes any instruction 
that uses the pipes, whether scalar or pipelined. 
Pl may be set even if KNF is set and the next 
floating point instruction is suppressed. Refer to 
section 2.8. — 


e SO (strong ordering) indicates whether the proc- 
essor is in strong ordering mode (SO = 1) or weak 
ordering mode (SO=0). SO is set if the EWBE# 
pin is active (LOW) at RESET. (Refer to the para- 

. graphs on write cycle reordering in section 5.) 


_ LATE BACK-OFF MODE *: 
CODE SIZE 8-BITS 
REPLACEMENT BLOCK 
REPLACEMENT CONTROL 
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2.2.5 DATA BREAKPOINT REGISTER 


The data breakpoint register (db) is used to gener- 
ate a trap when the i860 XP microprocessor access- 
es an operand at the virtual address stored in this 
register. The trap is enabled by BR and BW in psr.. 
When comparing, a number of low order bits of the 
address are ignored, depending on the size of the 


operand. For example, a 16-bit access ignores the 


low-order bit of the address when comparing to db; 


a 32-bit access ignores the low-order two bits. This 


ensures that any access that overlaps:the address 


contained in the register will generate a trap. The 


trap occurs before the register or memory update by 
the load or store instruction. 


2.2.6 DIRECTORY BASE REGISTER 


The directory base register dirbase (shown in Figure 
2.6) controls address translation, caching, and bus | 
options. 


-@ ATE (Address Translation enabley when set, en- 


ables the virtual-address translation algorithm. 


e DPS (DRAM Page Size) controls how many bits 
to ignore when comparing the current bus-cycle 
address with the previous bus-cycle address to. 
generate the NENE# signal. This feature allows 

_.for higher speeds when using static column or 
page-mode DRAMs and consecutive reads and 
writes access the same column or page. The 
comparison ignores the low-order 12 + DPS bits. 

A value of zero is appropriate for one bank of 

256K X n RAMs, 1 for iM xX n RAMS, etc. For 
interleaved memory, increase DPS by one for 
each power of interleaving—add one for 2-way, 
two for 4-way, etc. | 
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: Figure 2.6. Directory Base Register | 
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e When BL (Bus Lock) is set, external bus access- 
es are locked. The LOCK# signal is asserted 
with the next bus cycle (excluding instruction 
fetch and write-back cycles) whose internal bus 
request is generated after BL is set. It remains set 
on every subsequent bus cycle as long as BL re- 
mains set. The LOCK# signal is deasserted on 
the next load or store instruction after BL is 
cleared. Traps immediately clear BL. The lock 
and unlock instructions control the BL bit. The 
result of modifying BL with the st.c instruction is 
not defined. 


IT| (Cache and TLB Invalidate), when set in the 
value that is loaded into dirbase, causes all en- 
tries in the instruction cache and virtual tags in 
the address-translation cache (TLB) to be invali- 
dated. Also invalidates all virtual tags in the data 
cache. The ITI bit does not remain set in dirbase. 
IT| always appears as zero when reading 
dirbase. 


_ When software sets the LB bit, the i860 XP micro- 
processor enters two-clock late back-off mode. 
This mode gives two additional clock periods of 
decision time to the external logic that may need 
to use the BOFF # signal to cancel a bus cycle or 
data transfer. If the processor enters one-clock 
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late back-off mode during RESET via configura- 


tion pin strapping, the LB bit has no effect, and it 
is impossible to enter two-clock late. back-off 
mode. Furthermore, software cannot exit two- 
clock late back-off mode once it is activated; the 
LB bit cannot be cleared except by resetting the 
processor. | : . 


When CS8 (Code Size 8-Bit) is set, instruction 
cache misses are processed as 8-bit bus cycles. 
When this bit is clear, instruction cache misses 
are processed as 64-bit bus cycles. This bit can 
not be set by software; hardware sets this bit at 
initialization time. It can be cleared by software 
(one time only) to allow the system to execute out 
of 64-bit memory after bootstrapping from 8-bit 
EPROM. A nondelayed branch to code in 64-bit 
memory should directly follow the st.c (store con- 
trol register) instruction that clears CS8, in order 
to make the transition from 8-bit to 64-bit memory 
occur at the correct time. The branch instruction 
must be aligned on a 64-bit boundary. 


RB (Replacement Block) identifies the cache line 
(block) to be replaced by cache replacement al- 
gorithms. RB conditions the cache flush instruc- 
tion flush, which is discussed in Section 10. Ta- 
ble 2.3 explains the values of RB. 


RC (Replacement Control) controls cache re- 
placement algorithms. Table 2.4 explains the sig- 
nificance of the values of RC. | 
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e DTB (Directory Table Base) contains the high-or- 
der 20 bits of the physical address of the page 
directory when address translation is enabled (i.e. 
ATE = 1). The low-order 12 bits of the address 
are zeros. 


Value 


Table 2.3. Values of RB 


Replace Replace Instruction 
TLB Block | and Data Cache Block 


00 0 0 
01 1 1 
10 2 2 
11 3 3 


Table 2.4. Values of RC 


Value Meaning w} 


~ Selects the normal (random). 
replacement algorithm where any block 
in the set may be replaced on cache | 
misses in all caches. 


Instruction, data, and TLB cache misses 
replace the block selected by RB. This 
mode is used for cache and TLB testing. 


Data cache misses replace the block 
selected by RB. Instruction and TLB 
caches use random replacement. This 
mode is used when flushing the data 
cache with the flush instruction. 7 


Disables data and TLB caches 
replacement. Instruction cache uses 
random replacement. 


2.2.7 FAULT INSTRUCTION REGISTER 


When a trap occurs, this register contains the ad- 
dress of the trapping instruction (not necessarily the 
instruction that created the conditions that required 
the trap). The fir is a read-only register. In single-in- 
struction mode, using a Id.c instruction to read the 
fir anytime except the first time after a trap saves in 
idest the address of the Id.c instruction; in dual-in- 
struction mode, the address of its floating-point com- 
panion (address of the Id.c — 4) is saved. 


2.2.8 FLOATING-POINT STATUS REGISTER 


The floating-point status register (fsr) contains the 
floating-point trap and rounding-mode status for the 


current process. Figure 2.7 shows its format. 
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e. lf FZ (Flush Zero) is clear and underflow occurs, ce Table 2.5. Values of RM 
a result-exception trap is generated. When FZ is aT ; 2 7 
set and underflow occurs, the result is set to zero, | Value | Rounding Mode | Rounding Action 
and no trap due to.underflow occurs. | + Round to Closer to b of aorc: 
© If TI (Trap Inexact) is clear, inexact results do not nearest oreven | if equally close, 
cause a trap. If Tl is set, inexact results cause a . | | select even number 
trap. The sticky inexact flag (SI) is set whenever 7 ae (the one whose 
an inexact result is: produced, kagaldlees of the | . | | least significant bit 
setting of Tl |. : : | is zero). 
¢ RM (Rounding Mode) specifies one of the four {; 01 | Rounddown | a 
rounding modes defined by the IEEE standard. | (toward — 9) 
Given a true result 6 that cannot be represented Round up CC 
by the target data type, the i860 XP microproces- (toward +00) = | 


sor determines the two representable numbers 2 — 
and c that most closely bracket 6 in value (a < 
b < c). The.i860 XP microprocessor then rounds 
(changes) 4 to a or c according to the mode se- 

_ lected by RM as defined in Table 2.5. Rounding 
introduces an error in the result that is less than 
one least-significant bit. 


Chop. > Smaller in | 
(toward zero) ‘magnitude of a orc. 


FLUSH ZERO 
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ROUNDING MODE 
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Figure 2.7. Floating-Point Status Register 
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e The U-bit (Update Bit), if set in the value that is 
loaded into fsr by a st.c instruction, enables up- 
dating of the result-status bits (AE, AA, Al, AO, 
AU, MA, MI, MO, and MU) in the first-stage of the 
floating-point adder and multiplier pipelines. If this 
bit is clear, the result-status bits are unaffected 
by a St.c instruction; st.c ignores the correspond- 
ing bits in the value that is being loaded. An st.c 
always updates fsr bits 21..17 and 8..0 directly. 
The U-bit does not remain set; it always appears 
as zero when read. 


The FTE (Floating-Point Trap Enable) bit, if clear, 
disables all floating-point traps (invalid input oper- 
and, overflow, underflow, and inexact result). 


SI (Sticky Inexact) is set when the last-stage re- 
sult of either the multiplier or adder is inexact (i.e. 
when either Al or MI is set). SI is “sticky” in the 
sense that it remains set until reset by software. 
Al and MI, on the other hand, can by changed by 
the subsequent floating-point instruction. 


SE (Source Exception) is set when one of the 
source operands of a floating-point operation is 
invalid; it is cleared when all the input operands 
are valid. Invalid input operands include denor- 
mals, infinities, and all NaNs (both quiet and sig- 
naling). 


When read from the fsr, the result-status bits MA, 
Mi, MO, and MU (Multiplier Add-One, Inexact, 
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Overflow, and Underflow, respectively) describe 


the last-stage result of the multiplier. 


When read from the fsr, the result-status bits AA, 

Al, AO, AU, and AE (Adder Add-One, Inexact, 

Overflow, Underflow, and Exponent, respectively) 

describe the last-stage result of the adder. The 

high-order three bits of the 11-bit exponent of the 
_adder result are stored in the AE field. 


The Adder Add-One and Multiplier Add-One bits 


indicate that the absolute value of the result frac- 
tion grew by one least-significant bit due to 
rounding. AA and MA are not influenced by the 
sign of the result. 


After a floating-point operation in a given unit (ad- 


der or multiplier), the result-status bits of that unit 
are undefined until the point at. which result ex- 
ceptions are reported. 


When written to the fsr with the U-bit set, the 
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In a floating-point dual-operation instruction (e.g. 
add- and-multiply or subtract-and-multiply), both 
the multiplier and the adder ‘may set exception 
bits. The result-status bits for a particular unit re- 
main set until the next operation that uses that 
unit. 


RR (Result Register) specifies which floating- 
point register (f0-f31) was the destination register 
when a result-exception trap occurs due to a sca- 
lar operation. 


IRP (Integer (Graphics) Pipe Result Precision), 
MRP (Multiplier Pipe Result Precision), and ARP 
(Adder Pipe Result Precision) aid.in restoring 
pipeline state after a trap or process. switch. Each 
defines the precision of the last-stage result in 
the corresponding pipeline. One of these bits is . 


set when the result in the last stage of the corre- [a-™am 


sponding pipeline is double precision; it is cleared 
if the result is single precision. | 
LRP1 and LRPO (Load Pipe Result Precision) to- 
gether define the size of the last-stage result of 
the load pipeline. They are encoded as Table 2.6 
shows. 


Table 2.6. Values of LRP1 and LRPO 


| LRP1 LRPO pfild Length 


(reserved) 
4 Bytes 
8 Bytes 
- 16 Bytes 


2.2.9 KR, KI, T, AND MERGE REGISTERS 


The KR, KI, and T registers are special-purpose reg- 
isters used by the dual-operation floating-point in- 
structions pfam, pfsm, pfmam, and pfmsm, which 
initiate both an adder operation and a multiplier op- 
eration. The KR, KI, and T registers can store values 
from one dual-operation instruction and supply them 
as inputs to subsequent dual-operation instructions. 


(Refer to Figure 2.16.) 


The MERGE register is used only by the graphics 


instructions. The purpose of the MERGE register is 


result-status bits are placed into the first stage of . 


the adder and multiplier pipelines. When the 
processor executes pipelined operations, it prop- 
agates the result-status bits of a particular unit 
(multiplier or adder) one stage for each pipelined 
floating-point operation for that unit. When they 
reach the last stage, they replace the normal re- 
sult-status bits in the fsr and generate traps, if 
enabled. When the U-bit is not set, result-status 
bits in the word being written to the fsr are ig- 
nored. 
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to accumulate (or merge) the results of multiple-ad- 
dition operations that use as operands the color-in- 
tensity values from pixels or distance values from a 
Z-buffer. The accumulated results can then be 
stored in one 64-bit operation. 


Two multiple-addition instructions and an OR in- 
struction use the MERGE register. The addition in- 
structions are designed to add interpolation values 
to each color-intensity field in an array of pixels or to 
each distance value in a Z-buffer. 


intel. 


Refer to the instruction descriptions in section 10 for 
more information about these registers. 


2.2.10 BUS ERROR ADDRESS REGISTER 


The bear helps the trap handler determine faulty 
memory locations. The i860 XP microprocessor 
loads a valid address into. bear under these condi- 
tions: 


e For bus errors, the bear receives the address of 
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the cycle for which the BERR signal is asserted, if 


external hardware asserts BERR in the same 
Clock as it asserts BRDY # or one clock later. 


e For parity errors ona read, the bear receives the 
address of the cycle during which the processor 
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2.2.12 CONCURRENCY CONTROL REGISTER | 


The concurrency contro! register (cer) controls the 
operation of the internal Concurrency Control Unit 
(CCU), which is described in section 2.5. The cer 
can be written in supervisor mode only, but can be 
read in user or supervisor mode. nigure 2.8 shows 
the format of the cer. 


DO (Detached Only) bit aiid oe) (CCU On) bit aati 

er specify the CCU configuration. DO, when set, indi- 
cates that there is no external CCU. CO (CCU On) 
bit, when set, indicates that the Concurrency Control 


Architecture is enabled. Table 2.7 summarizes the 


detects the error, if external hardware asserts - 


PEN# with BRDY# for that cycle. 


If external hardware does not meet these conditions, 
the contents of the bear are undefined. 


A valid address in bear is accurate to 29 bits; that is, 
address signals A31—A3 are latched in the high-or- 
der 29 bits of bear. At RESET and after every parity 
and bus error trap, software must read the bear be- 
fore further parity and bus error traps can occur. The 
bear is a read-only register. 


2.2.11 PRIVILEGED REGISTERS 


_ operating system to use. They do not affect proces- 
sor operation. They can be accessed by the Id.c and 
st.c instructions, but they can be written only in su- 
pervisor mode. They may be used to store informa- 


modes defined by CO and DO bits. The reserved 
combinations should not be used by software. | 


lf the DCCU is on (CO=DO=1), the processor in- 
tercepts and interprets all memory loads and stores 
which are to the CCU address space, which is the 
two pages defined by CCUBASE. Loads and stores 
to that address range do not go to MSO: but to 
the DCCU. 7 


Table: 2.7. Values of CO and DO 


External CCU, or no CCU 


reserved» 
reserved 


_ The registers p0, p1 p2, and p3 are provided for the 


tion such as the interrupt stack pointer, current user. 


stack pointer at the beginning of the trap handler, 
register values during trap handling, processor ID in 
a multiprocessor system, or for any other purpose. 


Internal CCU di ehhh, only 


CCUBASE i is the virtual address of the memory area 
into which the CCU registers are mapped. Software 
must set bit 12 to zero, because the CCUBASE must 
be aligned on a two page (8 Kbyte) boundary. This is 
because an external CCU contains supervisor Teale: 
ters mapped to the second page: 


DETACHED ON LY 


31 530.29 28 27 25 25 24.23 22 21.2019 18 17 16 15 14 13,124 1110 76 5 4435[2]/1 Of 


CCUBASE 
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Figure 2.8. Concurrency Control Register - 
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2.2.13 NEWCURR REGISTER 
The NEWCURR register is part of the detached CCU 
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| Normally, multibyte data values are stored in memo- 


(concurrency control unit). It a 32-bit counter that | 


supplies an iteration count for loop execution. (Refer 
to section 2.5.) 


NEWCURR is architecturally a 64-bit register, but 
only the low-order 32 bits are provided in this imple- 
mentation. Compiler and operating-system data 
structures should provide for a 64-bit size for future 
implementation. 


2.2.14 STAT REGISTER 


The STAT register is part of the detached CCU (con- 
currency control unit). As Figure 2.9 shows, it con- 
tains the following bits: : 


‘InLoop Indicates that the processor is currently 
executing a concurrent loop. This bit is 
set when a processor starts a concur- 
rent, non-nested loop, and it is cleared 
when the processor enters serial code 
when not nested or idle. It can also be 
read or written directly. 


Indicates whether the processor is in the 
nested state. InLoop is copied into this 
bit when starting a nested loop. Other- 
wise, it can be read or written directly. 


Detached Always contains the value of cer bit DO. 


Nested 


STAT is architecturally a 64-bit register. Compiler 
and operating-system data structures should provide 
for a 64-bit size for future implementation. 


2.3 Addressing 


Memory is addressed in byte units with a paged vir- 
tual-address space of 232 bytes. Data and instruc- 
tions can be’ located anywhere in this address 
space. Address arithmetic uses 32-bit input values 
and produces 32-bit results. The low-order 32 bits of 
the result are used in case of overflow. 


ry in little endian format, i.e. with the least significant 
byte at the lowest memory address. As an option, 
the ordering can be dynamically selected by soft- 
ware in supervisor mode. The i860 XP microproces- 
sor also offers big endian mode, in which the most 
significant byte of a data item is at the lowest ad- 
dress. Figure 2.10 defines by example how data is 
transferred from memory over the bus into a register 
in both modes. Big endian and little endian data ar- 
eas should not be mixed within. a 64-bit data word. 
Illustrations. of data structures in this data sheet 
show data stored in little endian mode, i.e. the right- 
most (low- oe byte is at the lowest memory ad- 
dress. 


_ Code accesses are always done with little endian am 


addressing. This implies that instructions appear dif- 
ferently than documented here when accessed as 
big endian data. Intel Corporation recommends that 
disassemblers running in a big endian system con- 
vert instructions that have been read as data back to 
little endian form and present them in the format 
documented here. 


Page directories and page tables are also accessed 
in little endian mode, regardless of the ae of the 
BE bit. 


Big endian mode affects not only the memory load 
and store instructions but also the Idio, our Idint, 


and scyc instructions. 


InLoop 
Nested 
Detached 


31 30 29 28 27 26 25 24 235 22 212019 18 17 1615 MIS 12109 8 7 6 5 A Sf241 [OR , 


Alignment requirements are as follows (any violation 
results in a data-access trap): 


© 128-bit values are aligned on 16-byte boundaries 
when referenced in memory (i.e. the four least 
significant address bits must be zero). 


64-bit values are aligned on 8-byte boundaries 
when referenced in memory (i.e. the three least 
significant address bits must be zero). 


32-bit values are aligned on 4-byte boundaries 
when referenced in memory (i.e. the’ two least 
‘Significant address bits must be zero). 
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_ Figure 2.9. Concurrency Status Register 
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INSTRUCTION _LITTLE ENDIAN 


Byte Enables | 
__Asserted__ 


(BEn#) 


- Data ‘Bus 
d63 do 


Id.b O(r0), r16 
Id.b 1(r0), r16 
Id.b 2(r0), r16 
Id.b 3(r0), r16 
Id.b 4(r0), r16 
Id.b 5(r0), rié6 
Id.b 6(r0), r16]. 
Id.b 7(r0), r16 


Id.s 0(r0), r16 
Id.s 2(r0), r16 
Id.s 4(r0), r16 
Id.s 6(r0), r16 


Id.l O(r0), r16 
Id.l 4(r0), r16 


' NOTE: 
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Main Memory 


d63 


do 


ye 
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_ 
a 
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BIG ENDIAN 


Byte Enables 
Asserted 


(BEn#) 


™ 
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© 
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64- and 128-bit big endian accesses are treated the same as little endian accesses 


Figure 2.10. Little and Big Endian Memory Transfers 


© 16-bit values are aligned on 2-byte boundaries 
when referenced in memory (i.e. the least signifi- 
cant address bit must be zero). 


2.4 Virtual Addressing 


‘bit must be set if the operating system is to imple- 


When address translation is enabled, the processor - 


maps instruction and data virtual addresses into 
physical addresses before referencing memory. This 
address transformation is compatible with that of the 
Intel386 and Intel486 microprocessors and imple- 
ments the basic features needed for page-oriented 
virtual-memory systems and page-level protection. 


ment Bue eee protection or page-oriented vir- 
tual memory. 


2.4.1 PAGE FRAME oe 
A page frame is a unit of contiguous addresses of 


physical main memory. A page is the collection of 
data that occupies a page frame when that data is 


Present in main memory or occupies some location 


in secondary storage when there is not sufficient 
space in main memory. 


_ The i860 XP microprocessor architecture supports 


The address translation is optional. Address transla- _ 


tion is disabled when the processor is reset. It is 


enabled when a store (st.c) to dirbase sets the ATE 


bit. The operating system typically does this during 
software initialization. Address translation is dis- 
abled again when st.c clears the ATE bit. The ATE 


two sizes of pages and page frames: four Mbytes . 
and four Kbytes. Four Kbyte page frames begin on 
four Kbyte boundaries and are fixed in size. Four 


Mbyte page frames begin on four Mbyte boundaries 
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and are fixed in size. The four Kbyte address trans- 
formation is compatible with that of the Intel 486 mi- 
croprocessor. 


intel. 


2.4.2 VIRTUAL ADDRESS 


A virtual address refers indirectly to a physical ad- 
dress by specifying a page and an offset within that 
page. Figure 2.11 shows the formats of virtual ad- 
dressess. The format for virtual addresses that refer 
to four Mbyte pages is different from that of four 
Kbyte pages. 


Figure 2.12 shows how the i860 XP microprocessor 
converts.a virtual address into the physical address 
by consulting page tables. The addressing mecha- 
nism uses the DIR field as an index into a page di- 
rectory. For 4K pages, it uses the PAGE field as an 
index into the page table determined by the page 
directory and uses the OFFSET field to address a 
byte within the page determined by the page table. 
For 4M pages, the page directory entry determines 
the page address, and the OFFSET field addresses 
a byte within that page table. 


FORMAT 
FOR 

4 KBYTE 
PAGE 


FORMAT 
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2.4.3 PAGE TABLES 


A page table is simply an array of 32-bit page specifi- 
ers. A page table is itself a page, and contains 
4 Kbytes of data or at most 1K 32-bit entries. 


At the highest level is a page directory. The page 
directory holds up to 1K entries that address either 
page tables of the second level or 4-Mbyte pages. 


A page table of the second level addresses up to 1K 
4-Kbyte pages. All the tables addressed by one 
page directory, therefore, can address 1M 4-Kbyte 
pages. 


Whether 4-Mbyte pages, 4-Kbyte pages, or some 
combination of the two are used, one page directory 4 
can cover the entire four gigabyte physical address 
space of the i860 XP microprocessor (1K page di- 


rectory entries 4M page or 1K page directory en- [iii 


tries < 1K page table entries < 4K page). 


51 30 29 28.27 26 25 24 23 22421 2019 18.17 1615 14 13 1241110 9 8 7 6 5 4.5 2 1 Of 
PAGE 


OFFSET 


3) JO 29 28.2726 25 24 235 22421 2019 1817 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 f Of 


OFFSET | 
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Figure 2.11. Formats of Virtual Addresses 
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Figure 2.12. Address Translation 
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The physical address of the current page directory is 
stored in the DTB field of the dirbase register. Mem- 
ory management software has the option of using 
-one page directory for all processes, one page direc- 
tory for each process, or some combination of the 
two. 


2.4.4 PAGE-TABLE ENTRIES 
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Page-table entries (PTEs) have one of the formats 


shown by Figure 2.13. 


PRESENT 

WRITABLE 

USER 

ACCESSED 

PAGE SIZE(0 INDICATES 4 KBYTE) 
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2.4.4.1 Page Frame Address 


The page frame address specifies the physical start- 
ing address of a page. In a page directory, the page 
frame address is either the address of a page table 
or the address of the four Mbyte page frame that - 
contains the desired memory operand. In a second- 
level page table, the page frame address is the ad- 
dress of the 4-Kbyte page frame that contains the 
desired memory operand. . 


_ AVAILABLE FOR SYSTEMS PROGRAMMER USE 


PAGE 

DIR 

ENTRY | 
4 KBYTE 
PAGE 


PRESENT - 

WRITABLE 

USER 

WRITE-THROUGH 

CACHE-DISABLE 

ACCESSED 

DIRTY 

PAGE SIZE (1. INDICATES 4 face 


TEEPE PTE PPLE EE TETHER EEE ERE EETET EV ELT EVD EV EV TW) 
PAGE FRAME ADDRESS 31..12 as 
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AVAILABLE FOR SYSTEMS PROGRAMMER USE 


PAGE FRAME ADDRESS 
31..22 . 


(ee 


PRESENT 

WRITABLE 

USER 

WRITE-T'1ROUGH 
~ CACHE-DISABLE 

ACCESSED 

DIRTY 


eA EEGEaSE ESE | 
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AVAILABLE FOR SYSTEMS PROGRAMMER USE 


PAGE FRAME ADDRESS 31..12 


31 30 29 28.27.26 25 24 25 22 24 2019 18 17 16 5 14 13 12 mo BRP 


RESERVED BY INTEL CORPORATION (SHOULD BE ZERO) 


CciW 
tole? 
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Figure 2.13. Formats of Page Table Entries. 
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2.4.4.2 Present Bit 


The P (present) bit indicates whether a page table 
entry can be used in address translation. P= 1 indi- 
cates that the entry can be used. When P=0 in éi- 
ther level of page tables, the entry is not valid for 
address translation, and the rest of the entry is avail- 
able for software use; none of the other bits in the 
entry is tested by the hardware. If P=0 in either lev- 
el of page tables when an attempt is made to use a 
- page-table entry for address translation, the proces- 
sor signals either a data-access fault or an instruc- 
tion-access fault. In software systems that support 
paged virtual memory, the trap handler can bring the 
required page into physical memory. 


Note that there is no P bit for the page directory 
itself. The page directory may be not-present while 
the associated process is suspended, but the oper- 
ating system must ensure that the page directory 
- indicated by the dirbase image associated with the 
process is present in physical memory before the 
process is dispatched. 


2.4.4.3 Writable and User Bits 


The W (writable) and U (user) bits are used for page- 
level protection, which the i860 XP microprocessor 
performs at the same time as address translation. 
The concept of privilege for pages is implemented 
by assigning each page to one of two levels: 


Supervisor level For the operating system 
(U=0) and other systems software 
and related data. 


For applications procedures 
and data. 


User level (U = 1) 


The U bit of the psr indicates whether the i860 XP 
microprocessor is executing at user or supervisor 
level. The i860 XP microprocessor maintains the 
U bit of psr as follows: 


e The i860 XP microprocessor clears the psr U bit 
to indicate supervisor level when a trap occurs 
(including when the trap instruction causes the 
trap). The prior value of U is copied into PU. 


‘The i860 XP microprocessor copies the psr 
PU bit into the U bit when an indirect branch is 
executed and one of the trap bits is set. If PU was 
one, the i860 xP microprocessor enters user lev- 
el. 


With the U bit of psr and the W and U bits of the 
page table entries, the i860 XP microprocessor im- 
plements the following protection rules: 


© When at user level, a read or write of a supervi- 
sor-level page causes a trap. 
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When at user level, a write to a page whose W bit 
is not set causes a trap. | 


When at user level, a store (st.c) to certain con- 
trol registers is ignored. 


When at user level, privileged instructions (Idio, 
stio, scyc, Idint) have no effect. 


When the i860 XP microprocessor is executing at 
supervisor level, all pages are addressable, but, 
when it is executing at user level, only pages that 
belong to the user level are addressable. 


_ When the i860 XP microprocessor is executing at 
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supervisor level, all pages are readable. Whether a 
page is writable depends upon the write-protection 
mode controlled by WP of epsr: 


WP=0 All pages are writable. 


WP=1 A write to page whose W bit is not set 
Causes a trap. 


When the i860 XP microprocessor is executing at 
user level, only pages that belong to user level and 
are marked writable are actually writable; pages that 
belong to supervisor level are neither readable nor 
writable from user level. 


2.4.4.4 Write-Through Bit 


The i860 XP microprocessor implement both write- 
back and write-through caching policies for the on- 
chip instruction and data caches. If WT is set, the 
write-through policy is applied to data from the cor- 
responding page. If WT is clear, the normal write- 
back policy is applied to data from the page. 


For four-Mbyte pages, the WT bit of the page direc- 
tory entry is used. For four-Kbyte pages, only the WT 
bit of the second-level page table entry is used; the 
WT bit of the page directory entry is not referenced 
by the processor, but is reserved. 


The value of the WT bit is driven externally on the 
PWT pin, so that external caches can employ the 
same policy used internally. 


2.4.4.5 Cache Disable Bit _ 


If a page’s CD (cache disable) bit is set, data from 
the page is not placed in the internal instruction or 
data caches (regardless of the value of the WT bit). 
Clearing CD permits the processor to place data 
from the associated page into internal caches. 


For four-Mbyte pages, the CD bit of the page direc- 
tory entry is used. For four-Kbyte pages, only the CD 
bit of the second-level page table entry is used; the 
CD bit of the page directory entry is not referenced 
by the processor, but is reserved. 


The’ value of the CD bit is driven externally on the 
PCD pin, so that cacheability can be the same in 
both internal and external caches. 


2.4.4.6 Accessed and Dirty Bits 


The A (accessed) and D (dirty) bits provide data 
about page usage in both levels of the page tables. 


The i860 XP microprocessor sets the A-bit before a 
read or write operation to a page.. For four-Kbyte 
pages, it sets the A-bit of both levels of page tables. 


The processor tests the dirty bit before a write, and, 
under certain conditions, causes traps. The trap 
handler then has the opportunity to maintain appro- 
priate values in the dirty bits. For four-Mbyte pages, 
the D bit of the page directory entry is used. For four- 
Kbyte pages, only the D bit of the second-level page 
table entry is used; the D bit of the page directory 
entry is not referenced by the processor, but is 
reserved. The precise algorithm for using these bits 
is specified in section 2.4.5. 


An operating system that eee paged virtual 
memory can use the D and A bits to determine what 
pages to eliminate from physical memory when the 
_ demand for memory exceeds the physical memory 
available. The D and A bits are normally initialized to 
zero by the operating system. The processor sets 
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2.4.4.8 Combining Protection of Both Levels of 
Page Tables 


For any four-Kbyte page, the protection attributes of 
its page directory entry may differ from those of its — 
page table entry. The i860 XP microprocessor com- 
putes the effective protection attributes for a page 
by examining the protection attributes in both the 
directory and the page table and choosing the more 


restrictive of the two. 


2.4.5 ADDRESS TRANSLATION ALGORITHM | 


The following algorithm defines the translation of 


"each virtual address to a physical address. Let DIR, 


the A bit when a page is accessed either by a read | 
or write operation. When a data-access fault occurs, 


the trap handler sets the D bit if an allowable write is 
being performed, then reexecutes the instruction. 


The operating system is responsible for coordinating 
its updates to the accessed and dirty bits with up- 
dates by the CPU and by other processors that may 
share the page tables. The i860 XP microprocessor 


automatically asserts the LOCK# signal while test- _ 


ing and setting the A bit. . 


2.4.4.7 Page Tables for Trap Handlers 


When paging is enabled (ATE = 1), software that 
creates page tables and directories must assure that 
A = 1 always in the PTEs and PDEs for the code 
pages of the trap handler and the first data page 
accessed by the handler. Preallocation of these 
pages is required in case a trap occurs during a lock 
sequence. Otherwise, recursive traps would be gen- 
erated, as the A-bit would need to be set by the 
~ translation hardware, which is a rapeng situation in 
itself. 
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PAGE, and OFFSET be the fields of the virtual ad- 
dress; let PFA1 and PFA2 be the page frame ad- 
dress fields of the first and second level page tables 
respectively; DTB is the page directory table base 
address stored in the dirbase register. 


1. Read the PDE (Page Directory Entry) at the 
physical address formed by DTB:DIR:00. 


If P in the PDE is zero, generate a data- or in- 
struction-access fault. 


. If W in the PDE is zero, the operation is write, 
and either the U bit of the PSR is set or WP = 1, 
generate a data-access fault. 


_ If the U bit in the PDE is zero and U bit in the psr 
is set, generate a data- or instruction-access 
fault. - 


. lf A in the PDE is zero and the TLB miss oc- 
curred inside a locked sequence, generate a 
data or instruction access fault. (The trap allows 
software to set A to one and restart the se- 
quence. This helps external bus hardware deter- 
mine unambiguously what address corresponds 
to a locked semaphore.) 


. If bit 7 of the PDE is one (four Mbyte page), and. 
the operation is write, and D = 0 in the PDE, 
generate a data-access fault. . 

. lf A = 1 in the PDE, continue at step 11. Other- 

wise, assert LOCK #. | 

Perform the PDE read as in step 1 and the P, W 

and U bit checks as in steps 2 through 4. © 


. Write the PDE with A bit set. 
. Deassert LOCK #. 


. If bit 7 of the PDE is one (four Mbyte sagoy form 
the physical address as PFA1:OFFSET, and exit 
address translation. In this case, PFA1 is 10 bits 
and OFFSET is 22 bits. 


The remaining steps are for four Kbyte pages. If 
the A-bit in the PDE was zero before translation 
began, assert LOCK #. : 


2. 


12. 
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13. Fetch the PTE at the physical address formed 


by PFA1:PAGE:00. 


Perform the P-, W-, U-, and A-bit checks as in 
steps 2 through 5 with the second-level PTE. If 
A zero in the PTE, and the TLB miss oc- 
curred inside a locked sequence, generate a 
data or instruction access fault. LOCK# re- 
mains active. 


lf the operation is write, and D in the PTE is 
zero, generate a data access fault. 


If the A-bit in the PDE was already active before 


14. 


15. 


16. 
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translation began, and the A-bit in the PTE is 


already active, go to step 20. 


lf LOCK# is not already active, assert it and 
refetch the PTE. 


Perform the U-, W-, and P-bit checks and A-bit 
setting in the PTE as in steps 8 through 9. Do 
the locked write update of the PTE to unlock the 
bus, even if the A-bit in the PTE is already one. 


Deassert LOCK #. 


Form the physical address as PFA2:OFFSET. In 
this case, PFA2 is 20 bits and OFFSET is 12 
bits. 


Ae: 


18. 


19. 
20. 


During translation, the i860 XP microprocessor looks 
only in external memory for page directories and 
page tables. The data cache is not searched. There- 
fore, any code that modifies’: page directories or 
page tables must keep them out of the cache. The 
tables should either be kept in noncacheable memo- 
ry or in write-through pages or should be flushed 
from the cache. 


The i860 XP microprocessor expects page directo- 
ries and page tables to be in little endian format. The 
operating system must maintain these tables in little 
endian format either by setting BE to zero when ma- 
nipulating the tables or by complementing bit two of 
the 32-bit address when loading or storing entries. 


2.4.66 ADDRESS TRANSLATION FAULTS — 
The address translation fault can be signalled as ei- 
ther: an instruction access fault or a data-access 


fault. The instruction causing the fault can be reexe- 
cuted upon returning from the trap handler. 


2.5. Detached CCU 


The i860 XP microprocessor supports parallel pro- 


cessing, where multiple processors work simulta- 
neously on different parts of the same problem. The 
Concurrency Control Unit (CCU) controls work shar- 
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ing among CPUs, in multiprocessor systems. The 
CCU is a VLSI chip that allows multiple processors 
to work together to execute portions of a single pro- 
gram in parallel. The CCU performs the iteration as- 
signment for loop parallelization. Accesses to the 
CCU for synchronization are much faster than ac- 
cesses to shared memory semaphores. The CCU is 
memory mapped, and its internal registers are ac- 
cessed via memory load and store operations. 


To take advantage of the parallel architecture, soft- 
ware must be compiled by parallelizing compilers 
that generate instructions to access the CCU. How- 
ever, such instructions cannot run on a system that 
does not include a CCU. To allow an application 
compiled for parallel execution to run on any system 
based on the i860 XP microprocessor, a “Detached , 
Only” CCU (DCCU, ‘also referred to as “internal 
CCU”) is implemented in the i860 XP microproces- 
sor. The DCCU is a compatible subset of the exter- 
nal CCU, consisting of the minimal set of features 
required for a single CPU. The DCCU alone neither 
increases performance nor concurrency, but does 
allow software designed for parallel processing to 
run unmodified on a single CPU. 


2.5.1 DOCU INITIALIZATION 


After reset, the i860 XP microprocessor DCCU is dis- 
abled (CO and DO bits in cer are cleared). To en- 
able the DCCU, the CO and DO bits in cer must be 
set by software. Before turning on the CCU, the op- 
erating system must invalidate the TLB and flush the 
data cache to make sure that they do not contain 
data from the CCU pages. The TLB is invalidated by 
setting ITl = 1 in the dirbase register. Also, the 
flush instruction must be used once per each line of 
the data cache to invalidate the physical address of 
the cache entry, if the two pages at the CCUBASE 
address may have been cached. The flush is un- 
needed if page tables or external hardware have 
prohibited caching of the CCUBASE pages. 


Neither the external CCU nor the DCCU can be ac- 
cessed within four instructions after cer is modified. 


2.5.2 DCCU ADDRESSING 


The CCU facilities are memory-mapped, manipulat- 


ed by normal load and store instructions. The DCCU 


is memory-mapped to a single 4 Kbyte user page. 
When the DCCU is active, all accesses to this page 
are satisfied by the DCCU, and no external bus cycle 
is generated. The address space of two adjacent 


_ pages beginning on an 8 Kbyte boundary is reserved 
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for the CCU. The first (lower address) page contains 
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locations accessible in user mode (which includes 
the DCCU registers), and the second page contains 
locations: accessible in supervisor mode (used for 
external CCU only). The base address of these 
pages is specified by the CCUBASE field in cer. Ac- 
cesses to. the second page’ in DCCU-only mode 


have no effect on the DCCU, and are treated as 


normal oe accesses. 


When the DCCU iecblive. accesses to its address 
page use only the virtual address, and no translation 
is done on the DCCU access. However, the access- 
es to an external CCU go through normal address 
translation. The operating system should make sure 
that the page table entries for the CCU pages are 
set so that no fault occurs. during address transla- 
tion. If an external CCU is used, the two PTEs for the 
CCU should have CD = 1 (caching disabled) and 
page frame addresses that match the external hard- 
ware addresses of the CCU. Accesses to the DOCU 
that cause a TLB miss do not cause the PTE to be 
loaded into the TLB. 


If the external CCU is used when address translation 
is disabled (ATE= 0), external hardware must deac- 
tivate KEN# for such accesses, to avoid caching 
external CCU accesses. 


2.5. 3. DccU INTERNALS 


‘The DCCU consists of an address decoder, a 32. bit 
counter (NEWCURR), and three bits of state infor- 
mation (InLoop, Nested, and Detached). InLoop, 
Nested and Detached correspond to bits:0, 1, and 2 
respectively of the external CCU STAT register. The 


- Detached bit aNeye feueets: the Nae of the DO bit. 


in cer. 


| Savard saureeess within the DCCU | memory page 
are decoded to cause actions to. NEWCURR, In- 
‘Loop, and Nested state bits. The CCU register to be 
accessed is specified by address bits 11-3. The val- 
id CCU addresses are shown in Table 2.8 with their 
mnemonics. Accesses to these address may also 
have side effects within the DCCU. Refer to the 
i860™M Microprocessor Family Programmer’s Refer- 
ence Manual for programming information. Loads 
from any other addresses within the DCCU memory 
page return zero; stores to any other addresses 
have no effect. Access to the DCCU by any load or 
store instructions other than Id.x and st.x produce 
undefined results. Sax. 


‘Assemblers. should encode address bits 2- 0 as zero 
for accesses in little-endian mode. However, in big- 
endian mode (epsr BE bit = 1), DCCU accesses 
should have address bit 2 active. Thus, software for 
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big-endian access to the DCCU must differ from lit- 
tle-endian software. That allows an external CCU to 
be accessed in both big and little endian modes. 


When reading from the DCCU, the access latency is 
the same as reading data from the data cache—the 
data is ready for use as a source by the second 
instruction after the load. The first instruction after 
the load may use the data, but that instruction will 
experience a one-clock freeze before the data be- 
comes available. 


2.6 Instruction Set 


Table 2.9 shows the complete set of instructions for 
the i860 XP microprocessor, grouped by function 
within processing unit. Refer to Section 10 for an 
algorithmic definition of each instruction. The in- 
struction set of the i860 XP microprocessor is fully 
upward compatible with that of the i860 XR micro- 
processor, extended in a few ways to better serve 
certain application domains. User-level software ap- 
plications written for the i860 XR microprocessor will 
run unmodified on the i860 XP microprocessor, but 
some supervisor code (for example, trap handlers) | 
may need minor modifications. The i860 XR micro- 
processor instruction set has been extended with 
the following instructions: 


-® Idio, stio: |/O load and store instructions 
© Idint: Load interrupt instruction to perform an in- 
~ terrupt acknowledge cycle and read the interrupt 


vector. Used to emulate the Intel 486 interrupt 
acknowledge sequence. 


e scyc: A special-cycle instruction, used to. gener- 
ate bus cycles that signal invalidation and syn- 
chronization of an external cache. 


°. pfid.q: A pipelined, floating-point load of 128 bits. 


Table 2.8.CCU Addresses _ 


| Little Big 
MemenIs A11-A8 | A7—A4 | Endian | Endian 
A3-A0 | A3-—A0 


cbr__/ 
eget 


Ccnewcurr — 
cstat 
cstatci 
cstatn 

cclm 

cver 


NOTE: . | 
Variable iis a 4- bit index formed by A6-A3. Let its binary 
form be represented by the symbols abcd. 
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Table 2.9. Instruction Set (1 of 2) 


| Mnemonic | __—Description = 
~ Load and Store Instructions 


Load integer 
Store integer 

F-P load 

F-P store 
Pipelined F-P load 
Pixel store 


Register to Register Move 
Transfer integer to F-P register 
Integer Arithmetic Instruction 


Add unsigned 
Add signed 

Subtract unsigned 
Subtract signed 


Shift left 
Shift right 

Shift right arithmetic 
Shift right double 


ogical Instructions —_—_ 


Logical AND 
Logical AND high 


L 


andnot Logical AND NOT 
andnoth Logical AND NOT high 
or Logical OR 

orh Logical OR high 

xor Logical exclusive OR 


xorh Logical exclusive OR high 


- Control-Transfer Instructions | 
br 


Branch direct 


bri Branch indirect 

be Branch on CC 

be.t Branch on CC taken 
bne Branch on not CC 

bne.t Branch on not CC taken 
bte Branch if equal 

btne Branch if not equal 

bla Branch on LCC.and add 
call Subroutine call 

calli Indirect subroutine call 
intovr Software trap on integer overflow 


trap Software trap 
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- Dual-Operation Instructions _ | 


Floating-Point Unit 
| Mnemonic | Description 


Register to Register Move 


Transfer F-P to integer register 
F-P Multiplier Instructions 


fmul.p F-P multiply 
pfmul.p Pipelined F-P multiply 
pfmul3.dd 3-Stage pipelined F-P multiply 
fmlow.p F-P multiply low 

frcp.p F-P reciprocal 

fsqr.p F-P reciprocal square root 


: F-P Adder Instructions ) 


fadd.p F-P add 

pfadd.p Pipelined F-P add 
famov.r F-P adder move 
pfamov.r Pipelined F-P adder move 

fsub.p F-P subtract 

pfsub.p Pipelined F-P subtract | 

pfgt.p Pipelined greater-than compare 
pfeq.p Pipelined equal compare 

fix.v F-P to integer conversion 

pfix.v Pipelined F-P to integer conversion 
ftrunc.v_ _ F-P to integer truncation 


Pipelined F-P add and multiply 
Pipelined F-P subtract and multiply 
Pipelined F-P multiply with add 

Pipelined F-P multiply with subtract © 


Long Integer Instructions — 


fisub.z Long-integer subtract 
pfisub.z Pipelined long-integer. subtract 
fiadd.z Long-integer add 

pfiadd.z. Pipelined long-integer add 


Graphics Instructions 


fzchks | 16-bit Z-buffer check 

pfzchds Pipelined 16-bit Z-buffer check 
fzchkl 32-bit Z-buffer check 

pfzchkl Pipelined 32-bit Z-buffer check 
faddp Add with pixel merge 

pfaddp Pipelined add with pixel merge 
faddz Add with Z merge 

pfaddz Pipelined add with Z merge 

form OR with MERGE register 

pform Pipelined OR with MERGE register 
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Table 2.9. Instruction Set (2 of 2) 


| Mnemonic _ Description 
; 1/O-Instructions — 


Idio.x Load I/O | 
stio.x Store I/O 
Idint.x Load interrupt vector 


‘Cache flush 
Load from control register 
Store to control register 
Begin interlocked sequence 
End interlocked sequence 
Special bus cycles 


Assembler Pseudo-Operations 


Register to Register Move 


mov Integer move | 
— fmov.r F-P reg-reg move 
pfmov.r Pipelined F-P reg-reg move 
nop - Core no-operation © 
fnop F-P no-operation- 


pfle.p Pipelined F-P less-than or equal 
The architecture of the i860 XP microprocessor uses 
parallelism to increase the rate at which operations 
may be introduced into the unit. Parallelism in the 
i860 XP microprocessor is not transparent; rather, 
programmers have complete control: over parallel- 
ism and therefore can achieve maximum perform- 
ance for a variety of computational problems. - 


System Control Instructions 


PRELIMINARY 


i860™ XP MICROPROCESSOR 


2.6.1 PIPELINED AND SCALAR OPERATIONS 


One type of parallelism used within the floating-point 
unit is “pipelining”. The pipelined architecture treats 
each operation as a series of more primitive opera- 
tions (called “‘stages’’) that can be executed in par- 
allel. Consider just the floating-point adder as an ex- 
ample. Let A represent the operation of the adder. 
Let the stages be represented by Aj, Ao, and Az. 
The stages are designed such that Aj+ 4 for one ad- 
der instruction can execute in parallel with A; for the 
next adder instruction. Furthermore, each Aj can be 
executed in just one clock. The pipelining within the 
multiplier and graphics units can be described simi- 
larly, except that the number of stages may be differ- 
ent. ea : 


Figure 2.14 illustrates three-stage pipelining as 
found in the floating-point adder (also in the floating- 
point multiplier when single-precision input operands 
are employed). The central columns of the table rep- 
resent the three stages of the pipeline. Each stage 
holds intermediate results and also (when intro- 
duced into the first stage by software) holds status 
information pertaining to those results. The table as- 
sumes that the instruction stream consists of a se- 
ries of consecutive floating-point instructions, all of 
one type (i.e. all adder instructions or all single-preci- 


sion multiplier instructions). The instructions are rep- 


resented as A, B, etc. The rows of the table repre- 


sent the states of the unit at successive clock cy- 


cles. Each time a pipelined operation is performed, 


the result of the last stage of the pipeline is stored in 


the destination register fdes?t, the pipeline is ad- 


-vanced one stage, and the input operands of the 


operation are transferred to the first stage of the 


pipeline. ~ ou. See 


Paine 
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In the i860 XP microprocessor, the number of pipe- 
line stages ranges from one to three. A pipelined 
operation with a three-stage pipeline stores the re- 
sult of the third prior operation. A pipelined operation 
with a two-stage pipeline stores the result of the sec- 
ond prior operation. A pipelined operation with a 
one-stage pipeline stores the result of the prior oper 
ation. 


There are four floating-point pipelines: one for the 
multiplier, one for the adder, one for the graphics 
unit, and one for floating-point loads. The adder 
pipeline has three stages. The number of stages in 
the multiplier pipeline depends on the precision of 
the source operands in the pipeline; it may have two 
or three stages. The graphics unit has one stage for 
all precisions. The load pipeline has three stages for 
all precisions. 


Changing the FZ (flush zero), RM (rounding mode), 
or RR (result register) bits of fsr while there are re- 
sults in either the multiplier or adder pipeline produc- 
es effects that are not defined. 


2.6.1.1 Scalar Mode 


In addition to the pipelined execution mode, the 
i860 XP microprocessor also can execute floating- 
point instructions in “scalar”? mode. Most floating- 
point instructions have both pipelined and scalar 
variants, distinguished by a bit in the instruction en- 
coding. In scalar mode, the floating-point unit does 
not start a new operation until the previous floating- 
point operation is completed. The scalar operation 
passes through all stages of its pipeline before a 
new operation is introduced, and the result is stored 
automatically. Scalar mode is used when the next 
operation depends on results from the previous few 
floating-point operations (or when the compiler or 
programmer does not want to deal with pipelining). 


2.6.1.2 Pipelining Status Information 


Result status information in the fsr consists of the 
AA, Al, AO, AU, and AE bits, in the case of the ad- 
der, and the MA, MI, MO, and MU bits, in the case of 
the multiplier. This information arrives at the fsr via 
the pipeline in one of two ways: | 


1. It is calculated by the last stage: of the pipeline. 
This is the normal case. 


2. 
line. This method is used when restoring the 
state of the pipeline after a preemption. When a 
store instruction updates the fsr and the the U bit 
being written into the fsr is set, the store updates 
the result status bits in the first stage of both the 
adder and multiplier pipelines. When software 


It is propagated from the first stage of the pipe- © 
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‘When a scalar operation is executed, 
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changes the result-status bits of the first stage of 
a particular unit (multiplier or adder), the updated 
result-status bits are propagated one stage for 
each pipelined floating-point operation for that 
unit. In this case, each stage of the adder and 
multiplier pipelines holds its own copy of the rele- 
vant bits of the fsr. When they reach the last 
stage, they override the normal result-status bits 
computed from the last-stage result. 


At the next floating-point instruction (or at certain 
core instructions), after the result reaches the last 
stage, the i860 XP microprocessor traps if any of the 
status bits of the fsr indicate exceptions. Note that 
the instruction that creates the exceptional condition 
is not the instruction at which the trap occurs. 


2.6.1.3 Precision in the Pipelines 


In pipelined mode, when a floating-point operation is 
initiated, the result of an earlier pipelined floating- 
point operation is returned. The result precision of 
the current instruction applies to the operation being 
initiated. The precision of the value stored in /dest is 
that which was specified by the instruction that initia- 
ted that operation. 


lf fdest is the same as fsrc7 or fsrc2, the value being 
stored in fdest is used as the input operand. In this 
case, the precision of fdest must be the same as the 
source precision. 


The multiplier pipeline has two stages when the 
source operands are double-precision and three 
stages when they are single. This means that a pipe- 
lined multiplier operation stores the result of the sec- 
ond previous multiplier operation for double-preci- 
sion inputs and third previous for single-precision in- 
puts (except when changing precisions). 


2.6.1.4 Transition between Scalar and Pipelined 
Operations 


it passes 
through all stages of the pipeline; therefore, any un- 
stored results in the affected pipeline are lost. To 
avoid losing information, the last pipelined opera- 
tions before a scalar operation should be dummy 
pipelined operations that unload unstored results 
from the affected pipeline. 


_ After a scalar operation, the values of all pipeline 
stages of the affected unit (except the last) are un- 


defined. No spurious result-exception traps result 
when the undefined values are subsequently stored 
by pipelined operations; however, the values should 
not be referenced as source operands. . 
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For best performance a scalar operation should not 
immediately precede a pipelined operation whose 
fdest is nonzero. , 


2.6.1.5 Pipelined Loads 


The pfid instruction is optimized for accesses that 
miss the data cache and transfer directly from mem- 
ory. Therefore, even when there is a data cache hit, 
_ apfid may generate a bus cycle. The data from the 
internal cache is used only if it was modified. Other- 
wise, data is taken from the external bus, even if it 
resides in the on-board cache. | 


The pfld FIFO can be extended externally, due to 
the facts that a pfld always generates a bus cycle 
and that such a cycle can be identified externally by 
the value on the CTYP pin. Software written for an 
externally-extended pfld pipeline must ensure that it 
does not pfld from a location that was modified in 
the data cache. When a pfid cache hit to a modified 
line occurs, the pfld pipeline length used by the 
i860 XP microprocessor is three stages. The modi- 
fied data from the cache is put into the internal 
three-stage data FIFO, and the third pfld instruction 
after the data cache hit will update its fdest register 
with the modified data. | 


2.6.2 DUAL-INSTRUCTION MODE 


Another form of parallelism results from the. fact that 
_ the i860 XP microprocessor can execute both a 


3 


: 
core-op or d.fp-op 
d.fp-op 
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a 


. core-op 
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floating-point and a core instruction simultaneously. 
Such parallel execution is called dua/-instruction 
mode. When executing in dual-instruction mode, the 
instruction sequence consists of 64-bit alianed in- 
struction pairs, with a floating-point instruction in the 
lower 32 bits and a core instruction in the upper 32 
bits. Table 2.9 identifies which instructions are exe- 
cuted by the core unit and which by the floating- 
point unit. 


Programmers specify dual-instruction mode either 
by including in the mnemonic of a floating-point in- 
struction a d. prefix or by using the Assembler direc- 
tives .dual ... .enddual. Both of the specifications 
cause the D-bit of floating-point instructions to be 
set. If the i860 XP. microprocessor is executing in 
single-instruction mode and encounters a floating- 
point instruction with the D-bit set, one more.32-bit 
instruction is executed before dual-mode execution 
begins. If the i860 XP microprocessor is executing in 


_ dual-instruction mode and a floating-point instruction 


1 : 0 
. d.f 


is encountered with a clear D-bit, then one more pair 
of instructions is executed before resuming single-in- 
struction mode. Figure 2.15 illustrates two variations 
of this sequence of events: one for extended se- 
quences of dual-instructions and one for a single in- 
struction pair. 


Note that d.fnop cannot be used to initiate dual in- 
struction mode. 


| 


Enter Dual Instruction Mode 


Initiate Exit from 
Dual Instruction Mode 


Leave Dual Instruction Mode 
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Figure 2.15. Dual-Instruction Mode Transitions (1 of 2) 


core-op ; 


Temporary 
Dual Instruction Mode 
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Figure 2.15. Dual-Instruction Mode Transitions (2 of 2) 
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When a 64-bit dual-instruction pair sequentially fol- 
lows a delayed branch instruction in dual-instruction 
mode, both 32-bit instructions are executed. 


2.6.3 DUAL-OPERATION INSTRUCTIONS 


Special dual-operation floating-point instructions 
(add-and-multiply, subtract-and-multiply) use both 
the multiplier and adder units within the floating- 
point unit in parallel to efficiently execute such com- 
mon tasks as evaluating systems of linear equa- 
tions, performing the Fast Fourier Transform (FFT), 
and performing graphics transformations. 


The instruction classes pfam fsrc?, fsrc2, fdest, 
pfmam fsrc?7, fsrc2, fdest (add and multiply), pfsm 
fsrc1, fsrc2, fdest, and pfmsm fsrc7, fsrc2, fdest 
(subtract and multiply) initiate both an adder opera- 
tion and a multiplier operation. Six operands are re- 
quired, but the instruction format specifies only three 
operands; therefore, there are special provisions for 
specifying the operands. These special provisions 
consist of: 


e Three special registers (KR, KI, and T) that can 
store values from one dual-operation instruction 

_ and supply them as inputs to Sure aon dual-op- 
eration instructions. 


— The constant registers KR and KI can store 
the value of fsrc7 and subsequently supply 
that value to the multiplier pipeline | in place of 
fsre?. 


’ Single Precision 
3-Stage Multiplier and Adder 


fsrc1 fsrc2 fdest 


“opt ope 
MULTIPLIER 


~ result 
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— The transfer register T can store the last-stage 
result of the multiplier pipeline and subse- 
quently supply that value to the adder PPOs 
in place of fsrc7. 


© A four-bit data-path control. field in the opcode 
(DPC) that specifies the operands and mes of 
the special registers. 


1. Operand-1 of the multiplier can be KR, Kl, or 
fsrc7. 


Operand-2 of the multiplier can be fsrc2, the 
last-stage result of the multiplier pipeline, or 
the last-stage result of the adder pipeline. 


. Operand-1 of the adder can be fsrc7, the 
T-register, the last-stage result of the multiplier 
pipeline, or the last-stage result of the adder 
pipeline. 


. Operand-2 of the adder can be fsrc2, the last- 
stage result of the multiplier pipeline, or the 
last-stage result of the adder pipeline. 


2 


Figure 2.16 shows all the possible data paths sur- 
rounding the adder and multiplier. The DPC field in 
these instructions selects different data paths. Sec- 
tion 10 shows the various encodings of the DPC 
field: 


Note that the mnemonics pfam.p, pfsm.p, 
pfmam.p, and pfmsm.p are never used as such in 
the assembly language; these mnemonics are used 
here to designate classes of related instructions. 
Each value of DPC has a unique mnemonic associ- 
ated with it. 


Double Precision 
2-Stage Multiplier, 3-Stage Adder 


fsrce1 | fsrc2 fdest 


| - -MULTIPLIER- - j 


result 


opto 


result 
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Figure 2.16. Dual-Operation Data Paths 
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2.7 Addressing Modes 


Data access is limited to load and store instructions. 
Memory addresses are computed from two fields of 
load and store instructions: /src7 and /src2. 


1. isrc7 either contains the identifier of a 32-bit inte- 
ger register or contains an immediate 16-bit ad- 
dress offset. | 


2. isrc2 always specifies a register. 


Because either /src7 or isrc2 may be null (zero), a 
variety of useful Addressing modes result: 


offset + register Useful for accessing fields 
, within a record, where register 
points to the beginning of the 
record. Useful for accessing 
items in a stack frame, where 
register is r3, the register used 
for pointing to the beginning of 

_ the stack frame. 


register + register Useful for two-dimensional ar- 
_ : rays or for array access within 
the stack frame. 


register 


Useful as the end result of any 
arbitraryaddress calculation. 
offset Absolute address into the first 


or last 32K of the logical ad- 
dress space. 


In addition, the floating-point load and store instruc- 
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tions may. select autoincrement addressing. In this _ 


mode /src2 is replaced by the sum of /src7 and isrc2 
after performing the load or store. This mode makes 
stepping through arrays more efficient, because it 
eliminates one address-calculation instruction. 


2.8 Traps and Interrupts 


‘Traps are caused by exceptional conditions detect- 
ed in programs or by. external interrupts. Traps 
cause interruption of normal program flow to exe- 
cute a special program known as a trap handler. 
Traps are divided into the types shown in Table 2.10. 


2.8.1 TRAP HANDLER INVOCATION 


This section applies to traps other than reset. When 
a trap occurs, execution of the current instruction is 
aborted. Except for bus error and parity error traps, 
the instruction is restartable. The processor takes 
the following steps while transferring control to the 
trap handler: 


1. Copies U (user mode) of the psr into PU (previ- — 


ous U). 


2. Copies IM (interrupt mode) into PIM (previous 
IM). | 
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. Sets U to zero (Supervisor mode). 
4. Sets IM to zero (interrupts disabled). 


. If the processor is in dual instruction mode, it sets 
DIM; otherwise it clears DIM. 


. If the processor is in single-instruction mode and 
the next instruction will be executed in dual-in- 
struction mode or if the processor is in dual-in- 
struction mode and the next instruction will be 
executed in single- -instruction mode, DS is set; 
otherwise, it is cleared. | 


. The appropriate trap type bits in psr and epsr are 
set (IT, IN, IAT, DAT, FT, OF, IL, Pl, PT, BEF, 
PEF). Several bits may be set if the correspond- 
ing trap conditions occur simultaneously. 


. An address is placed in the fault instruction. regis- 
ter (fir) to help locate the trapped instruction. In. 
single-instruction mode, the address in fir is the 
address of the trapped instruction itself. In dual- 
instruction mode, the address in fir is that of the 
floating-point half of the dual instruction. If an in- 
struction or data access fault occurred, the asso- 
ciated core instruction is the high-order half of 
the dual instruction (fir + 4). In dual-instruction 
mode, when a data access fault occurs in the 
absence of other trap conditions,. the floating- 
point half of the dual instruction will already have 
been executed (except in the case of the fxfr 
instruction). 


The processor begins executing the trap handler by 
transferring execution to virtual address 
OxFFFFFFOO. The trap handler begins execution in 
single-instruction mode. The trap handler must ex- 
amine the trap-type bits in psr (IT, IN, IAT, DAT, FT) 
and epsr (OF, IL, PT, Pl, BEF, PEF) to determine the 
cause or causes of the trap. 


2.8.2 INSTRUCTION FAULT 


This fault is caused by any of the following condi- 
tions. In all cases the processor sets the IT bit be- 
fore entering the trap handler. 


1. By the trap instruction. When trap is execuedir in 
dual-instruction mode, the floating-point compan- 
ion of the trap instruction is not executed before 
the trap is taken. 


. By the intovr instruction. The trap occurs only if 
OF in epsr is set when intovr is executed. To 
distinguish between cases 1 and 2, the trap han- 
dier must examine the instruction addressed by 
fir. The trap handler should clear OF before re- 
turning. When intovr causes a trap in dual-in- 
struction mode, the floating-point companion of 
the intovr instruction is completely executed be- 
fore the trap is taken. 
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Table 2.10. Types of Traps 


[indications SSC awed by 
Peer [ener | fer | Condition + ——Insruction 


IT OF trap 
intovr 
IL 
PT & PI 
= 


Any 
Instruction IAT Address translation | 
Access Fault exception during instruction 
fetch 


Any scalar or pipelined 
instruction that uses a 
pipeline 
Data DAT Load/store address Any load/store © 
Access translation exception 7 
Fault Misaligned operand address | Any load/store 
Operand address matches Any load/store 
db register 
Parity Parity error on data ane during bus read operation 
Error Fault when PEN # pin active 
| Bus Error Fault | Error Fault ee a External interrupt signal on BERR pin 


Interrupt PIN | INT J External interrupt signal on INT pin 
{Reset || None | PEF,BEF| —_| Hardware RESET signal 


3. By violation of lock/unlock protocol, explained There may be other instructions between any of 
below. (Note that trap and intovr should not be these steps. The bus is locked after step 2, and re- 
used: within a locked sequence; otherwise, it mains locked until step 4. Step 4 must follow step 1 
would be difficult to aeiiguey between this and by 30 instructions or less; otherwise, an instruction 


Instruction 
Fault 


Software traps 


Missing unlock 
Pipeline usage 


Floating 
Point 
Fault 


Floating-point source 


Any M- or A-unit except 
exception | 


fmlow 


Any M- or A-unit except 
fmlow, pfgt, and pfeq. 
Reported on any F-P 

instruction, pst, fst, and 
sometimes fid, pfld, and 
ixfr 


Floating-point result 
exception 
overflow 
underflow 
inexact result 


the prior cases.) trap occurs. In case of a trap, IL is also set. If the 
4. By execution of an instruction that uses a sdibelin load or store instruction of step 2 accesses a previ- 
when the PT bit of epsr is set. (Refer to section ously unaccessed page (A= 0), the bus is locked 
2.8.2.2.) | briefly while the A bit is set, unlocked, then locked 


again to satisfy the lock instruction and start the 
| locked sequence. | 
2.8.2.1 Lock Protocol | 


The lock protocol requires the following sequence of 2.8.2.2 Using PT and Pl Bits 


activities: _ 
pe The PI and PT bits are provided to help the trap — 


1. lock © a handler avoid unnecessarily saving:and restoring the 
_ 2. Any load or store instruction. For compatibility pipelines (refer to the section “Pipeline Preemption” 
with future processor generations, this should be in the ‘860 Microprocessor Family Programmer s 
a load. | Reference Manual). | 
3. unlock 


| : Trap handlers that use Pl or PT must initially exam- 
4. Any load or store instruction. For compatibility ine fsr. If a pending trap exists—that is, if the FTE. 


llc @ processor generations, this should be (floating-point trap enable) bit is set and any of the 
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floating-point exception bits (Al, AO, AU, MI, MO, | 


MU) is active—the trap handler must save the pipe- 
lines. The i860 XP microprocessor, like the i860 XR 
microprocessor, may set an fsr exception bit before 
the floating-point trap is generated, and this pending 
trap relies on information in.the pipeline. For exam- 
ple, an external interrupt might invoke the trap han- 


dler between the scalar floating-point instruction that 


produces an overflow and the next floating-point op- 
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processor divides these into two classes: source ex- 
ceptions and result exceptions. The numerics library 


- supplied by Intel provides the IEEE standard default 


handling for all these exceptions. 


| 2.8.3.1 Source Exception Faults 


_ All exceptional operands, including infinities, denor-. 


eration—the one that would cause a branch to the ~ 


trap handler for the floating-point trap. 


If no pending trap exists, the handler = follow ei- 
ther of the following two methods: 


malized numbers and NaNs, cause a floating-point 
fault and set SE in the fsr. Source exceptions are 
reported on the instruction that initiates the opera- 


tion. For pipelined operations, the pipeline is not ad- 


© Using both PT and PI: Upon invocation, the trap | 


handler saves the state of Pl and PT (in epsr), 
but does not save the pipes. If Pl is found set 


(which means that the interrupted code needs — 


the state information currently in the floating- 
point pipelines), the handler sets PT and clears Pl 


(with a single st.c to epsr instruction), then con- 


tinues with trap processing. If the pipes are. used 
_ during trap handling (even by a scalar instruc- 
 ‘tion), a trap will be generated with IT and PI set 


by hardware. The trap handler may then check PI 


and PT, and if both are set, clear PT, Pl, and IT, 


save the pipes, set an indication that they were — 
saved, and restart execution from the instruction © 
that caused the trap. At the end of trap handling, | 


the trap handler restores the pipes if they were 


saved, and restores PI and PT to their values be- 
fore the trap. This method avoids both saving and 
restoring the pipes, assuming that most trap han- 


dling sequences do not alter the pipes, and there- 


fore a trap for PT =1 will not happen very often. — 


Using only Pl: Another approach is to leave 
_ PT=0, using only the Pl bit, which the processor 

sets each time a pipelined instruction or pfld is 
- encountered (even if the floating point instruction 
__ iS Suppressed due to KNF = 1). The trap handler 

saves PI, saves the pipes if Pl is set, sets an indi- 
- cation that they were saved, and clears Pl. At the 


vanced. 


SE is undefined for faults on fid, pfid, fst, pst, and 
ixfr instructions under these conditions: 


-@ In single-instruction mode, always. 


e |n dual-instruction mode, when the companion in- 
struction is not a multiplier or adder operation. 


2.8.3.2 Result Exception Faults . 


The result exceptions include: 


¢ Overflow. The absolute value of the rounded true 
- result would exceed the largest positive finite 
number in the destination format. 


Underflow (when FZ is clear). The absolute value 
of the rounded true. result would be smaller. than 
the smallest positive He numer in ne destina- 


= ‘tion format. 


—Inexact result (when TI.is set). The result is not 
exactly representable in the destination format. 
For example, the fraction 1 cannot be precisely © 
represented in binary form. This exception occurs 
frequently and indicates that some generally ac- 
ceptable) accuracy has been lost. | 


The point at which a result Scant is reported de- 


- pends upon whether pipelined operations. are being 


_ end of trap handling, the trap handler restores the. 


pipes if they were saved, and restores PI to its 
value before the trap. With this method, the pipes 
are sometimes saved and restored unnecessarily 
if the trap handler code does not use the pipes. 
This method is advised when it is known that the 
trap handler uses the pipes. 


2 8.3. FLOATING- POINT FAULT 


The floating- point fault is reported | on floating- point — 


instructions, pst, fst, and sometimes fld, pfld, and 
ixfr. The floating-point faults of the i860 XP micro- 
processor support the floating-point. exceptions de- 
_ fined by the IEEE standard as well as some other 
useful classes of exceptions. The i860 XP_micro- 
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used: 


° Scalar (nonpipelined) ‘operations. Result eXx- 
ceptions are reported ‘on the next floating-point, 
fst.x, or pst.x (and sometimes fid, pfld, ixfr) in- 
struction after the scalar operation. When a trap: 
occurs, the last-stage of the affected unit con- 
tains the result of the scalar operation. 


Pipelined operations. Result exceptions are re- 
‘ported when the result is in the last stage and the 
next floating-point (and sometimes fid, pfid, ixfr) 
instruction is executed. When a trap occurs, the 
pipeline is not advanced, and the last- -stage re- . 

sults thal caused the rap) remain unchanged. 


When no trap occurs (either because FTE is clear or 


because no exception occurred), the pipeline is ad- 


intel. 


vanced normally by the new floating-point operation. 
The result-status bits of the affected unit are unde- 
fined until the point that result exceptions are report- 
ed. At this point, the last-stage result-status bits (bits 
29..22 and 16..9 of the fsr) reflect the values in the 
last stages of both the adder and multiplier. For ex- 
ample, if the last-stage result in the multiplier has 
overflowed and a pfad4d is started, a trap occurs and 
MO is set. . 


For scalar operations, the RR bits of fsr report in 
which register the result was stored. RR is updated 
when the scalar instruction is initiated. The result ex- 
ception trap, however, occurs on a subsequent in- 
struction. Programmers must prevent intervening 
stores to fsr from modifying the RR bits. Prevention 
may take one of the following forms: 


° Before any store to fsr when a result exception 
may be pending, execute a dummy floating-point 
operation to trigger the result-exception trap. 


° Always read from fsr before storing to it, and 
mask updates so that the RR bits are not 
changed. 


For pipelined operations, RR is cleared; the result is 
in the last stage of the pipeline of the appropriate 
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unit. The trap handler must flush the pipeline, saving 


the results and the status bits. 


In either pipelined or scalar mode, the trap handler 
must compute the result to be returned. In either 
case, the result delivered by the CPU has the same 
significand as the true result and has an exponent 
that is the low-order bits of the true result. The trap 
handler can inspect the delivered result, compute 
the result appropriate for that instruction (a NaN or 
an infinity, for example), and store the computed re- 
sult. If RR is nonzero, the trap handler must store 
the computed result in the register specified by RR; 
if RR ts zero, it must load the last stage of the pipe- 
line with the computed result instead of the saved 
result. 


Result exceptions may be reported for both the ad- 
der and multiplier at the same time. In this case, the 
trap handler should fix up the last stage of both pipe- 
lines. 


2.8.4 INSTRUCTION ACCESS FAULT | 
This trap occurs during address translation for in- 
struction fetches in any of these cases: 


© The address fetched is in a page whose P (pres- 
ent) bit in the page table is clear (not present). 
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e The address fetched is in a supervisor mode 
page, but the processor is in user mode. 


e The address fetched is in a page whose PTE has 
A = 0, and the access occurs during a locked 
sequence (i.e. between lock and unlock). 


Note that several instructions are fetched at one 
time, either due to instruction prefetching or to in- 
struction caching. Therefore, a trap handler can 
change from supervisor to user mode and continue 
to execute instructions fetched from a supervisor 
page. An instruction access trap occurs only when 
the next group of instructions is fetched from a su- 
pervisor page (up to eight instructions later). If, in the 
meantime, the handler branches to a user page, no 
instruction access trap occurs. No protection viola- 


tion results, because the processor does not permit faa 
data accesses to supervisor pages while running in § 


user mode. 


2.8.5 DATA ACCESS FAULT 


This trap results from an abnormal condition detect- 
ed during data operand fetch or store. Such an ex- 
ception can be due only to one of the following caus- 
es: | 


© An attempt is being made to write to a page 
whose D (dirty) bit is clear. 


A memory operand is misaligned (is not located 
at an address that is a multiple of the length of 
the data). | 


The address stored in the debug eaisiee is equal 
to one of the addresses spanned by the operand. 


The operand is in a not-present page. 


An attempt is being made from user level to write 
to a read-only page or to access a supervisor-lev- 
el page. 

The operand is in a page whose PTE has A = 0, 
and the access occurs during a locked sequence 
(i.e. between lock and unlock). 


Write protection (determined by epsr bit WP = 1) 
is violated in supervisor mode. 


When a data access trap is taken on a pipelined 
floating-point instruction that occurs immediately af- 
ter the load or store instruction that causes the trap, 
the destination register of the pipelined floating-point 
instruction may be partially updated. Correct execu- 
tion will occur when the trap handler resumes execu- 
tion after handling the DAT, because the pipelined 
floating-point instruction will then correctly update its 
destination register. 
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2.8.6 PARITY ERROR TRAP 


If the PEN# pin is active and the bus unit detects a 
parity error during a bus read operation, the proces- 
sor sets PEF and IN, then generates a trap. Further 
parity error traps are masked as soon as PEF is set. 
To reenable such traps, software must clear PEF 
and unfreeze BEAR by executing Id.c bear, rdest. 


The interrupted program is not restartable. BS (bus 
Or parity error. trap in supervisor mode) is set by the 
i860 XP microprocessor when a parity error occurs 
while the processor.is in supervisor mode. The oper- 
ating system can use this bit to decide, for example, 
whether to abort the process (user niece) or reboot 
the system Sree mode). 


2.8.7 BUS ERROR TRAP 


When external hardware asserts the BERR pin, the 
Processor sets BEF (bus error flag) and IN (inter- 
rupt), and then traps. Further BERR traps are 
masked as soon as BEF is set by hardware. To 
reenable .such traps, software must clear BEF and 
unfreeze BEAR by. executing Id.c bear, rdest. 


BS (bus or parity error trap in supervisor mode) is set 

_by the i860 XP microprocessor when a bus error oc- 
curs while the processor is in supervisor mode. The 
operating system can use this bit to decide, for ex- 
ampie, whether to abort the process (user mode) or 
reboot the en (supervisor mode). , 


2.8.8 INTERRUPT TRAP 
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2.8.9 RESET TRAP 


When the i860 XP microprocessor is reset, execu- 
tion begins in single-instruction mode at virtual’ ad- 
dress OxFFFFFFOO. This is the same address as for | 
other traps. The reset trap can be distinguished from 
other traps by the fact that no trap bits are set. The 


instruction cache is flushed. The bits DPS, BL, and 


ATE in dirbase are cleared. CS8 is initialized by the 
value at the INT pin at the end of reset. The read- 
only fields of the epsr are set to identify the proces- 
sor, while the IL, WP, and PBM bits are cleared. The 
bits U, IM, BR, and BW in psr are cleared, as.are the 


_ trap bits FT, DAT, IAT, IN, and IT. All other bits of 


An interrupt is an event that is signaled from an ex- 


ternal source. If the processor is executing with in- 
terrupts enabled (IM set in the psr), the processor 
sets the interrupt bit IN in the psr and INT in the 
epsr, then gencreles an interrupt trap. 


Vectored interrupts are implemented ” interrupt 
controllers and software. Software can use the Idint 


psr and all other register contents are undefined. 
Refer to Table 2.11 for a summary of these initial 
settings. 


The software must ensure ‘that the control registers 


are properly initialized before performing operations 
that depend on the values of those registers. 


Reset code must initialize the floating-point pipeline 
state to zero‘with floating-point traps disabled to en- 
sure that no Spurrous floating- point traps are gener- 
ated. 


After a RESET the i860 XP microprocessor starts 
execution at supervisor level (U=0). Before branch- 
ing to the first user-level instruction, the RESET trap 
handler or subsequent initialization code has to set 
PU and a trap bit so that an indirect branch instruc- 
tion will copy PU to ies thereby changing to user lev- 
el. | 


2. Q ‘Debugging 


The i860 XP microprocessor supports aabuading 


instruction to generate an interrupt acknowledge 


(INTA) cycle. This instruction generates a bus cycle 
with INTA cycle specifications, and: places the data 
returned from the bus to the destination register. 
Tags are not checked in the data cache for hit, and 
the cycle is not burstable. 


The Intel 486 mlcroprecessor generates two INTA 
cycles as a response to an interrupt-and inserts four 
idle clocks in between. To generate an interrupt ac- 
knowledge sequence that is compatible with the 
Intel 486 microprocessor, the Idint instruction se- 
quence documented in section 5.1.4 should be exe- 
cuted. 
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with both data and instruction breakpoints: The fea- 
tures of ‘the i860. XP microprocessor architecture 
that support debugging include: 


e db (data breakpoint register), which permits 
specification of a data address that the i860 XP 
microprocessor will monitor. 


BR (break read) and BW (break write) bits of the 
psr, which enable trapping of either reads or 
writes (respectively) to the address in db. 


DAT (data access trap) bit of the psr, which al- 
lows the trap handler to determine when a data 
breakpoint was the cause of the trap. 


trap instruction that can be used to set break- 
points in code. Any number of code breakpoints 
can be set. The values of the isrc? and isrc2 
fields help identify which breakpoint has oc- 
curred. 


IT (instruction trap) bit of the psr, which allows 
the trap handler to determine when a trap 
instruction was the cause of the trap. 


Y 
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Table 2.11. Register and Cache Values after Reset 


Integer Registers 
Floating-Point Registers 
psr 


epsr 


db 

dirbase 

fir 

fsr 

bear 

p3—-p0 

ccr 

KR, Kl, T, MERGE 
NEWCURR 
STATUS 


Undefined 


_ Undefined 


U, IM, BR, BW, FT, DAT, IAT, IN, IT = 0; 
others are undefined 

IL, WP, PBM, BE, PT = 0; BEF, PEF = 1; 
Processor Type, Stepping Number, DCS, 
SO are read only; others are undefined 
Undefined 

DPS, BL, LB, ATE = 0; others are undefined 
Undefined 

Undefined 

Undefined 

Undefined : 

CO, DO = 0; others are undefined 
Undefined 

Undefined 

InLoop, Nested, Detached = 0 


aes intial Value 


Instruction Cache 
Data Cache 
TLB 


3.0 ON-CHIP CACHES 


By holding data, instructions, and address transla- 
tion on-chip, the caches of the i860 XP microproces- 


sor provide the following advantages: 
1. Low chip count for the CPU subsystem. — 


2. Wide processor-to-cache path: 16 Byles for data, 
8 bytes for instructions. 


3. Fast access without requiring much additional 
high-speed design in the system. The fast 
(50 MHz) cache-access circuitry is hidden on 
chip; the external bus can respond more slowly 
without significantly degrading performance. 


3.1 Address Translation Caches 


The i860 XP microprocessor allows both four Kbyte 
and four Mbyte page sizes, and a separate transla- 
tion look-aside buffer (TLB) is used to cache ad- 
dress translation information for each page size. The 
TLB for four-Kbyte pages (Figure 3.1) has 64 entries, 
and the TLB for four-Mbyte pages (Figure 3.2) has 
16 entries. Both are four-way set associative. The 
TLBs function when paging is enabled. When a page 
is first accessed, its translation information is saved 
in the appropriate TLB along with other page aittri- 
butes, such as access rights and cacheability. Every 
address translation operation looks up the virtual ad- 
_ dress simultaneously in both TLBs. Only if the nec- 


All entries invalid 
All entries invalid . . 


~ All entries invalid 


- essary paging information is not in either of the 
caches must the paging tables in memory be refer- 
~enced. Both TLBs employ a random replacement al- 
gorithm to choose which of the four ways to replace. 


lf an instruction’s virtual address is found in the in- 
struction cache, the virtual address is not translated, 
and code access rights are not verified. However, 
when an instruction’s virtual address is not found in 
the cache, address translation does occur, and all 
access rights are verified. The virtual addresses of 
data are always translated, and access rights are 
always verified. 


The i860 XP microprocessor requires simultaneous 
access to data and instruction caches, but the TLBs 
can service only one address translation at a time. 
Data address translation has higher priority in the 
TLBs than instruction address translation, if both are 


. required at the same time. 


Any data or instruction access fault halts address 
translation at once, and the TLB is not updated. If a 
directory read causes an access fault, the page ta- 
ble is not read at all. 


If the paging unit generates a fault (in setting the D 
bit for the first write to a nondirty page, for example), 
the corresponding entry is deleted from the TLB. 
Therefore, software does not need to invalidate the 
TLB entry in response to DAT or IAT faults. 
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Figure 3.1. 4K TLB Organization 
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lf TLB replacement is initiated during a locked se- 
quence generated by the lock instruction and if an- 
other locked sequence has to be executed to set the 
A-bit, the paging unit generates an access fault. This 
helps external hardware implement “locking by ad- 
dress” by preventing generation of nested lock se- 
quences. 


3.2 Internal Instruction and Data 
Caches 


The i860 XP microprocessor has separate data and 
instruction caches on-chip. Having separate caches 
for instructions and data allows simultaneous cache 
look-up. Up to two instructions and 128 bits of data 
can be accessed simultaneously from these caches. 
The data and instruction caches hold 16 Kbytes 
each. A line can be filled from memory with a four- 
transfer burst. 


The caches are fully transparent to applications soft- 
ware. Snooping (address monitoring) is designed 
into both instruction and data caches, to maintain 
cache consistency in multiprocessor systems. 


Each cache has two sets of tags: virtua/ tags used 
for internal access, and physica/ tags used for 
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snooping. Figure 3.3 shows how the bits of both vir- 
tual and physical addresses are mapped for cach- 
ing. The presence of both virtual and physical tags 
supports aliasing, a situation in which the TLBs as- 
sociate a single physical address with two or more 
virtual addresses. 2 | 


Any area of memory can be cached, although both 
software and hardware can disallow certain areas 
from being cached—software by setting the CD bit in 
their page table entries; hardware by deasserting the 
KEN# signal for bus cycles with addresses that fall 
in those areas. (Data reads from the two four-Kbyte 
pages pointed to by the CCUBASE field of ccr are 
not cached (and the CACHE # signal is. inactive), if 
the DCCU is activated by setting CO of the ccr 
register. This is independent of the value of KEN #.) 
When both software and hardware agree that a re- 
quested datum is cacheable, the i860 XP microproc- 
essor fetches an entire 32-byte line and places it 
into the appropriate cache. Cache line fills are gen- 
erated only for read misses, not for write misses. A 
store that misses the cache does not copy the 
missed line into cache from memory,. but rather 
posts the datum in a write buffer, then sends it to the 
external bus when the bus is available. 
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Figure 3.3. Cache Address Usage 
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3.2.1 DATA CACHE. 


Figure 3.4 shows the organization of the data cache. 
The data cache has two status bits per physical tag 
and one validity status bit for the virtual tag. A virtual 
tag hit is possible only when the validity bit of the 
- Virtual tag is set and the state of the Pere tag is 
M, E, or. S. | 


Aliasing support is built into the cache look-up algo- 
rithm. Even though a physical line may be aliased, 
the processor never enters the line twice in the data 
cache. If a virtual address is not found among the 
virtual tags in the data cache, a bus cycle is initiated 
(except a read is not issued at this time if the bus 
pipeline is full) and, at the same time, the physical 
tags are searched for the physical address (which by 
this time has been retrieved from the paging unit). 
For reads, if the physical address is found, the data 
_ returned from the bus is ignored, on-chip data is 
used, and the virtual tag is replaced with the new 
one. For writes, if.a virtual address is not found, the 
write is issued on the bus and memory is updated. If 
the physical address is found, the line in cache is 
updated, and the virtual tag is replaced with the new 
one. However, the cache state (M, E, or S) of the 
physical-address tag does not change when the vir- 
tual tag is overwritten. 


Note that the BE (big endian) bit of epsr has no 
influence on data cache behavior. Data items are 


kept in cache in exactly the same ordering as in ex- 
ternal memory. Byte-shifting operations invoked by 
_ the BE bit upon loads and stores occur at the input 
to the register files only. 


3.2.1.1 Data Cache Update Policies 


To minimize bus traffic, a write-back policy is normal- 
ly used. The write-back policy (also called copy-back 
_ and deferred-write) reduces bus traffic by eliminating 
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many unnecessary writes. Writes to a line in the 
cache are not immediately forwarded to main mem- 
ory; instead, they are accumulated in the cache. The 
modified cache line is written to main memory only 
when its cache space is needed for other data, 
when the modified data is needed by another proc- 
essor, or when a flush procedure is executed. 


Under the write-back policy, a write that hits the 
cache utilizes it for two cycles (one to check the 
virtual tags for hit, another to update the cache line). 
However, the cache pipeline allows successive 
store hits to operate at one per cycle. The proces- 
sor’s internal write buffers can hold. two successive 
stores, preventing a freeze upon store miss. 


Under a write-through policy, a write request to a line 
in the cache triggers updates to both cache and 
main memory. An address decoder, for example, 
can select the write-through policy for writes to video 
RAM, where it is necessary that writes be seen on 
the video display. Software, by setting the WT page- 
table bit, can select the write-through policy for spe- 
cific areas of memory—those that are used for inter- 
processor message queues, for example. 


A. write-once policy combines write-through with 
write-back. Write-through is employed for the first 
write to a cache line, while subsequent writes to the 


~same line follow the write-back policy. Write-once is 
_ valuable in multiprocessor systems to maintain. 


cache consistency with the least possible bus traffic. 
The first write broadcasts to other processor nodes 


_ the fact that a line has been modified. Write-once is 
_also used if a second-level cache is attached to the 


i860 XP microprocessor to maintain consistency be- 
tween the first- and second-level caches. 


_ The external system can dynamically change the up-— 


date policy (write-back, write-through, write-once) of 


__the i860 XP microprocessor with each cache line. 
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Figure 3.4. Data Cache Organization 
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3.2.2 INSTRUCTION CACHE 


Figure 3.5 shows the organization of the instruction 
cache. The instruction cache has one validity bit that 
is common to both virtual and physical tags. Aliasing 
support for instructions consists not simply of chang- 
ing the virtual tag, but rather fetching a line whenev- 
er a virtual tag miss occurs. If the physical address 
already exists in the instruction cache, its line and its 
tags are overwritten. So, even though a physical line 
may be aliased, the processor never enters the line 
twice in the instruction cache. 


3.2.3 CACHE REPLACEMENT ALGORITHM 


The data, instruction, and address-translation 
caches all use similar algorithms to choose which of 
the four cache blocks will be overwritten when a 
miss causes a line fetch. 


First, the first invalid line (if any) in a set of four is 
replaced (in the order 0, 1, 2, 3). When there are no 
more invalid lines in a set, a pseudorandom replace- 
ment algorithm chooses which valid lines to replace. 
The algorithm is controlled by counters inside the 
chip. RESET initializes these counters to zero, so 
that the “randomness” is deterministic and two 
i860 XP CPUs executing the same code on identical 
boards have exactly the same series of cache hits, 
misses, and replacements. 


[FOUR ot 
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Setting ITI to invalidate the caches and TLBs also 
resets the counters used to select the set used for 
cache line replacement. This brings the i860 XP mi- 
croprocessor cache-replacement mechanism to a 
known state without resetting the whole chip. 


When the flush instruction is used to write back 
modified lines in the data cache, the flush routine 
must alter the RC (replacement control) field of 
dirbase. Therefore, replacement is not random. In- 
stead, the block (or ‘‘way’’) replaced is the one se- 
lected by the RB (replacement block) field of 


‘ dirbase. 


3.2.4 CACHE CONSISTENCY PROTOCOL © 


The i860™ XP Microprocessor implements cache Tea : 


consistency via its use of a MESI (Modified, Exclu- 
sive, Shared, Invalid) protocol. | 


3.2.4.1 Data Cache States 


Each line of the data cache of the i860 XP micro- 
processor can be in one of the states defined in Ta- 
ble 3.1. Note that the instruction cache of the 
i860 XP only implements the “SI” part of the MESI 


_protocol, because the instruction cache is not writa- 


ble. 


[7 PHYSICAL ‘vi | 
real TAG 


Fg] PHYSICAL vf 
7 PHYSICAL ty 
f TAG . ; 
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Figure 3.5. Instruction Cache decd 


Table 3.1. MESI Cache Line States 


A write to this line. . does not go 


to bus 


. does not go 
to bus 


.. goes to bus 
: andupdates — 
the cache 


nee to bus 


= 
€. 
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Table 3.2. Internally Initiated Cache State Transitions | 


| State |  NextStateafterRead |. 


|  IfWB/WT#=1;E;elseS 


Line fill 


NOTE: 


Next State after Write* 
_Write-through 
es | 
Write-through 


_IfWB/WT# =1,E; else S_ 
M 


* “Write” does not include write-backs due to replacement. Those can only cause an M to | 


transition. 


The state of a cache line can change.as the result of 
either internal or external activity related to that line. 
Table 3.2 presents the line state transitions that re- 
sult from internal activity of the i860 AP puiche ploces. 
sor in the data ane 


External cache- -consistency support is provided 
through inquiry cycles. Inquiry cycles are initiated by 
other processors in a multiprocessor system to 
check whether an address is cached in the internal 
cache of the i860 XP microprocessor. Table 3.3 
shows the line state transitions initiated by inquiry 
cycles. 2 Ws 


Table 3.3. Inquiry-Initiated 
Cache State Transitions 


3. 2. 4.2 Write-Once Policy 


A write-once cache policy can be pen shied. 


‘through use of the WB/WT# input. pin. The signal | 


on this pin is sampled in both read and write cycles. 
A read miss causes a line to enter either S or E after 
the line fill. If WB/WT# is sampled LOW at the time - 
_ of NA# or the first BRDY# activation, the line en- 


ters S state, forcing the next write hit to this line to | 


show up on the bus. If WB/WT# is sampled HIGH, 
the line enters E state. In write-through cycles, the 
state of a line is changed from S to E when WB/. 
WT# is sampled HIGH, so that subsequent writes 


will not be written. through to the bus. Thus, if this | 


signal is driven LOW on read cycles'and HIGH on 
write cycles, a write-once cache policy is implément- 
ed. The easiest way to implement write-once (in sys-_ 
tems not using the 82495XP cache controller) is to 
tie this pin to.the W/R# Sutbu of the processor. | 


oN 


If the WT bit in the page table entry is set, the 
i860 XP microprocessor ignores the WB/WT # sig- 
nal for the cycles that hit that page and always per- 
forms a write-through. In other words,. hardware can- 
not override software’s selection of the write- 
through policy. 


3.2.4.3 Locked Access _ 


Locked accesses are those data loads and stores 
that occur after a lock instruction up to and including 


‘the first load or store after. the Corresponding unlock 


instruction. 


State transitions for locked accesses differ from 


those in Table 3.2 in ways that guarantee that 
locked accesses are seen by all processors in the 


system. Any locked load or store generates both a 
. cache look-up and an external bus cycle, regardless 


: cache hit or miss. 
. Ina locked read: 


a. lf the required data is not found in the cache, 

' the. data from the bus is used. The data is 

_ placed in the cache if it is cacheable and 
KEN # is also asserted. 

.b. If the required data is found in an unmodified 

(E or S) state, the data from the bus is used. 


c. If the data is found in the cache in a modified 

_ (M) state, the cached data is used, and the 
‘bus data is ignored, as long as no inquiry 
write-back occurs before the BRDY# of the 
bus cycle. If, however, an intervening inquiry 
write-back changes the line to S or | state, the 
bus data is used. 


.. A.locked store is forced through the cache and 
issued on the bus. No more data accesses occur 
until the last BRDY# for the store. If the store 
hits the internal cache, the cache update is done 
after the last BRDY # from the bus. Note that the 
line written by a locked store remains in M state 
in spite of the write-through to the bus, because 
the length of the write-through is less than the 
line size of 32 yes. | | 
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Locked accesses are totally serializing in the sense 
that: 


1. All loads and stores that precede the lock 
instruction are issued on the bus (if they miss the 
cache) before the first locked access is issued. 
The locked access can be issued before the last 
BRDY # of the prior cycle if NA# is activated in 
response to the prior cycle. 


. No load or store after the last locked access is 
issued internally or on the bus until the final 
BRDY # for all locked accesses. 


To maximize performance, instruction fetches during 
the locked sequence are not serializing. When NA# 
invokes pipelining, instruction fetches may be issued 
while locked data fetches or stores remain on the 
bus. 


3.3 


Both the instruction and the data caches can be 
snooped by externally generated inquiry cycles, and 
the result of the look-up is presented on the HIT # 
and HITM# output pins. These inquiry cycles help 
maintain consistency with caches of other proces- 
sors. However, software must take care not to cre- 
ate inconsistencies such as the following among the 
internal caches (including the TLBs): | 


1. Changing the address space while leaving virtual- 
address tags from the prior space in the instruc- 
tion or data cache. 


. Changing instructions in memory (or in the data 
cache) without it enanging them in the instruction 
cache. 


. Changing page table information in memory (or in 
the data cache) without changing the same infor- 
mation in the TLBs. 


Internal Cache Consistency 


Under certain circumstances, such as |!/O refer- 
ences, self-modifying code, page-table updates, or 
shared data in a multiprocessing system, it is neces- 
sary to bypass, to invalidate, or to flush the caches. 
The i860 XP microprocessor provides the following 
methods for doing this: 


© Bypassing Instruction and Data Caches. 


1. If deasserted during cache-miss processing, 
the KEN# pin disables instruction and data 
caching of the referenced data. 


. If the CD bit of the associated page table is 

_ get, caching of a page is disabled. The value of 
the CD bit is output on the PCD pin for use by 
external caches. 
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3. If the WT bit of the associated page table is 
set, caching is not disabled, but writes pass 
through the cache. The value of the WT bit is — 
output on the PWT pin for use by external 
caches. (Note that WT does not affect policy 
for the instruction cache, because the instruc- 
tion cache is not writable. However, when an 
instruction from a page having the WT bit of 
the PTE set is placed in the data cache, the 
write-through policy applies just as for a data 
page.) 

° Invalidating Cache Entries. Storing to the. 
dirbase register with the ITI bit set invalidates 
each line of the instruction and address-transla- 
tion caches. In the data cache, it invalidates the 
virtual tags, but not the physical tags. 


backs. The same effect (writing back modified 
lines) can be achieved with the load instruction 
Id.{1, but this would be more than twice as slow— 
the load must first do four bus transfers to get 
new data, then write back the modified line. The 
flush instruction causes the write-backs without 
requiring a read from external memory to replace 
the modified line. 


3.3.1 ADDRESS SPACE CONSISTENCY 


In a multitasking virtual-address system, the operat- 


ing system may intentionally employ aliasing, where 


several processes use the same physical memory 
while accessing it with different virtual addresses. 
When the operating system switches control from 
one process to the next, it changes the DTB field of 
the dirbase to point to a different page directory that 
defines the new address space. When this happens, 
all caches must be invalidated: the TLBs, so that the 
new page directory is read into the TLBs; the data 
and instruction caches, so that virtual addresses 
from the new space don’t accidently match cached 
virtual addresses from the old space. 


The caches are invalidated by setting the ITI bit 


when writing to dirbase. Invalidating the instruction 
cache invalidates both the physical and the virtual 
tags, because the instruction cache has one status 
(valid) bit, which is common to both physical and 
virtual tags. In the data cache, setting IT] does not 
invalidate physical tags. However, any modified lines 
will eventually be written back when their space is 
required for lines from the new address space or 
when external agents on the bus express a need for 
the modified data via inquiry cycles. 
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The caches are invalidated by setting the ITI bit 
when writing to dirbase. Note, however, that the op- 
erating system code that flushes the caches must 
be present during the flushing. Typically this code 
has the same virtual address for all processes. 


NOTE: | 
The mapping of the page(s) containing the cur- 
_ rently executing instruction, the next six in- 
structions,’ and any data referenced by these 
instructions should not be different in the ‘new 
page tables when the DTB is changed. 


Enabling or disabling address translation (via the 
ATE bit) is similar to changing the DTB, in that the 
address mapping is changed. The virtual tags in the 
data and instruction cache must be invalidated prior 
to ene ATE. 


3.3.2 INSTRUCTION CACHE CONSISTENCY 
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3.3.3 PAGE TABLE CONSISTENCY 


When the operating system modifies page tables or 
directories, the TLBs can become inconsistent with 
the modifications for any of the following reasons: 


e Because the data cache uses a write-back policy, 
updates to cached page tables do not immediate- 
ly update memory. | , 


Changes to page tables do not sicher up- 
date the TLB. 


The i860 XP microprocessor searches ay exter- 
nal memory for page directories and page tables 
in the translation process. The data cache is not 
searched. (Data is not transferred from the data 
cache to the TLBs during TLB replacement cy- 
cles.) 


Software must ensure that modified lines containing 


When software modifies a page containing instruc- - 


tions (as when a debugger replaces an instruction 


with the trap instruction to set a breakpoint), the in- 


struction cache can become inconsistent for any of 
the following reasons: 


© Because the data cache uses a wie: back policy, 
changes to cached instruction pages do not im- 
mediately update memory. | 


© Changes to instructions do not automatically uP 
date the instruction cache. 


~e Instruction cache misses are not checked in the 
- data cache. 


Software must ensure that modified lines containing 
instructions are written to main memory before the 
instruction cache tries to read them. There are two 
methods for this: 


1. Flush the data cache using the flush instruction. 
Note that to make the instruction cache consist- 
ent with the data cache, the data cache must be 
flushed before invalidating the instruction cache. 


: Mark all instruction pages as WT (write through) 
so that modifications to instructions are immedi- 
ately written to money. This is the better eneine: 

ve: 


In either case, the instruction cache must be invali- 
dated (by a store to dirbase with ITI set) after a 
code page has been modified, so that the updated 
instructions will be read from memory. 


page table entries are written to main memory be- 
fore the paging unit tries to read them. There are two 
methods for this: 


1. Keep page tables and directories in noncachea- 
ble memory or write-through pages. . 


2. Flush the data cache using the flush instruction. 


The processor itself invalidates the affected TLB en- 
try, when a trap is triggered by the need to set the A 
or D bit. In other cases, after a page table or directo- 
ry has been modified, software must invalidate the 
TLBs (by a store to dirbase with ITI set) so that the 
updated entries will be read from memory. 


The data cache does not need flushing if the pro- 
gram is modifying only the P, U, W, A, or D bits of a 
PTE (as long as the page frame address is not 
changed.and the PTE itself is not in the data cache.) 
The i860 XP CPU does not use the TLB for cache 
line write-backs; it writes to the address in the physi- 

cal tag. 


Thus, a trap handler can service a data access trap 
for D-bit zero merely by setting D= 1. When setting 


‘the P or A bits, there is no need to invalidate or flush 


any caches, because the processor does not load 
entries into the TLB that have P=0 or A=0O. 


Two potential TLB inconsistencies are avoided auto- 
matically by the i860 XP microprocessor. 


1. If the paging unit issues a write cycle (to set the A 
bit, for example), this cycle is snooped by the 
_ data cache for invalidation. 


. Any TLB entry that causes a DAT or IAT i is auto- 
matically invalidated. 
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3.3.4 CONSISTENCY OF CACHEABILITY 


Normally, an operating system ensures that the 
page attributes (CD and WT) of a memory access 
are consistent with the cache contents. However, 
the operating system can fail to maintain consisten- 
cy by the following actions: 


© Changing the CD or WT bits while related lines 
are in the cache. 


© Aliasing a physical address with virtual addresses 
that have differing CD or WT bits. 


In these situations, the i860 XP microprocessor 
gives priority to cache state. For example: 


1. If a read or write request is to a noncacheable 
page (CD= 1), but the data (or code) is found in 
cache, the request is satisfied by the cache, ane 
no external cycle is issued. 


2. If the physical address of a read or write request 
hits in the cache but the virtual address misses, 
the virtual tag is overwritten by the new virtual 
address, but the CD bit of the new virtual address 
is ignored. 


3. If a store to a write-through page (WT = 1) hits a 
cache line in E or M state, no write-through cycle 
is issued; only the cache is updated. 


3.3.5 LOAD PIPE CONSISTENCY 


The pfld (pipelined floating-point load) instruction fa- 
cilitates transfer of data from memory to registers, 
and avoids placing data in the data cache. When 
large amounts of data are used, pfld allows the pro- 

- grammer to keep rarely-used data out of the cache. 
The i860 XP microprocessor ensures consistency 
between cached data and pfid references. It checks 
the data cache and, upon a data cache hit to a modi- 
fied line, forwards data from cache into the three- 
stage pfid pipeline. 
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3.3.6 SUMMARY 


Table 3.4 summarizes flush and invalidation require- 


ments, assuming that WT is set in the PTEs of in- 


struction and page-table pages: 


Table 3.4. Summary of 
Cache Flushing And Invalidation 


invalidate 
Caches 
(ITI) 


Setting A 7 No 

Setting P | , No: 
Clearing P Yes 
Setting D ) No - 


Changing protection (U,W) Yes 
Setting CD or WT Yes 
Changing PFA in a used(1) PTE Yes 
Changing dirbase DTB Yes 
Changing dirbase ATE Yes 
Changing epsr WP No 


Setting cer DO and CO Yes(2) 
Modifying code Nok Yes 


NOTES: 

1. “Used” means a PTE that ate some past time had P set. 
2. If data from either of the CCU pages could have been 
cached. 

3. Assuming all instructions and their page directories and 
page tables are in write-through or noncacheable pages. 


4.0 HARDWARE INTERFACE 


In the following description, of hardware interface, 
the # symbol at the end of a signal name indicates 
that the active or asserted state occurs when the 
signal is at a low voltage. When no # is present after 
the signal name, the signal is asserted when at the 
high voltage level. 


4.1 Pins Overview 


Figure 4.1 identifies functional groupings of the pins. 
Table 4.1 lists every pin by its identifier, gives a brief 
description of its function, and lists some of its char- 
acteristics. All output pins are tristate, except BREQ, 
HIT#, HITM#, HLDA, LOCK#, and PCHK#. 
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ADS # 


BE7 #-BE0O# 


BREQ 
CACHE # 
CTYP 
D/C# 
HIT # 
HITM#: 
HLDA 
KBO,KB1 
LEN 
LOCK# 
M/IO# 
NENE# 
PCD 
PCHK# 
PCYC 
PWT 
TDO 
—W/R# 


A31-A3 
D63—D0 


DP7-DPO 


AHOLD 
BERR 
-BOFF # 
RSRVD# 
BRDY # 
BYPASS # 
CLK 
RESET 
EADS# 
EWBE # 
FLINE # 
HOLD 
INT/CS8 
INV 
KEN# 
NA# 
PEN# 
TCK 

TDI 

TMS 
TRST# 
_WB/WT# 
SPARE > 


- Test Output 
Write/Read 


i860™ XP MICROPROCESSOR 


Table 4.1. Pin Summary. 


Active | When Floated Internal 
Level Synch/Asynch — Resistor 


Output Pins 


Address Status 
Byte Enable 

Bus Request 
Cache 

Cycle Type 
Data/Code 
Snoop Hit Cache 


~ Snoop Hit Modified Line 


Hold Acknowledge » 


~ Cache Biock 


Length 
Address Lock 


‘Memory/IO. - 


Next Near 

Page Cache Disable 
Parity Check 

Page Cycle 

Page Write-Through :- 


Input/Output Pins 
Address 


Data 


Data Parity 
Input Pins’ 


Address Hold 

Bus Error 

Back-Off 

Intel Reserved 

Burst Ready 

Intel Reserved - 

Clock 

Reset 

External Address Status 
External Write Buffer eee 
Flush Line | 


~ Bus Hold 


Interrupt/Code-Size 8 . 
Invalidate . 

Cache Enable 

Next Address 

Parity Enable 


_ Test Clock 


Test Data Input 

Test Mode Select 

Test Reset 
Write-Back/Write-Through 
Intel Reserved 
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HLDA, clock after BOFF# | 


HLDA, BOFF # 


HLDA, BOFF # 
HLDA, BOFF # 
HLDA, BOFF # 


-HLDA, BOFF# 


HLDA, BOFF # 


HLDA, BOFF # 


HLDA, BOFF # . 


HLDA, BOFF#¥._- 


HLDA, BOFF# 


HLDA, BOFF # 


| - Nonscan Mode 


HLDA, BOFF # 


AHOLD, HLDA, BOFF # 


HLDA, BOFF# — 
HLDA, BOFF # | 
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The pins D/C #, W/R#, and M/lIO# define bus cy- 
cle types. They are summarized in Table 4.2. For 


data transfers to or from memory, two additional . (erie 

pins, CTYP and PCYC, provide further information DATA | ADDRESS N 

regarding the type of transfer, as shown in Table 4.3. 

Table 4.4 shows how the LEN and CACHE#¥ pins 

determine cycle length. - : 
PARITY 


Table 4.2. ADS# Initiated Bus Cycle Definitions 


M/iO# | D/C# | W/R# Bus Cycle Initiated 


Interrupt Acknowledge 


PEN# 


g LEN 


0 0 , 

0 th Special Cycle sae f CACHE# 
1 0 | 1/O Read 

1 1 |/O Write 

0 8) Code Read 

0 1 Reserved 

1 0 Memory Read 

1 1 


Memory Write 


Table 4.3. Memory Data Transfer Cycle Types 


PCYC | CTYP| W/R# Data Transfer Type 


Normal read 
Pipelined load (pfld instruction) 
Page directory read 

Page table read 

Write-through (S-state hit) | 
Store miss or write-back 

Page directory update 

Page table update 


INT/CS8 


oe 


INTERRUPT 


BOUNDARY 
SCAN 


PCYC and CTYP are defined only for memory data transfer se 
cycles (D/C# =1, M/IO# = 1) 


=~-3-0c0-30c00 


- O-" 0+: 0+ 0 
-=--0000/§ 
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Figure 4.1. Signal Grouping 


Table 4.4. Cycle Length Definition 


CACHE # Cycie Description Burst Length 


1 Noncacheable** 64-bit (or less) read 
| Noncacheable 64-bit (or less) read 
64-bit (or less) write 
1/O and Special Cycles — 
Noncacheable 128-bit read (p)fid.q 
~ Noncacheable 128-bit read (p)fld.q 


128-bit write fst.q 
Cache line fill | | 
Cache write-back 


RRM Ph | = Aa a 


NOTE: Pie 3 gee Ae 
** Includes CS8-mode code fetches, which may be cached by the processor. 
-—Indicates “don’t care” values. . : 
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4.2 Signal Description 


In this section descriptions of all pins are presented 
in alphabetical order. | 


4.2.1 A31-A3 (ADDRESS PINS) 


The 29-bit address bus (A31—A3) identifies address- 


es to a 64-bit location. Separate byte-enable signals 


(BE7 # -BE0#) identify which bytes should be ac- 


cessed within the 64-bit location. 


The address lines are bidirectional. The i860 XP mi- 
croprocessor drives the address lines unless it is ina 
hold state. The system drives address lines A31-A5 
to perform cache line inquiries (refer to the EADS # 
signal description). | 


4.2.2 ADS# (ADDRESS STATUS) 


The i860 XP microprocessor asserts ADS #. to iden- 
tify the first clock period of each bus cycle, the clock 
period during which new values become valid on the 
address bus and cycle-definition pins. This signal is 
held active for one clock. | | 


If BOFF# is asserted, the processor floats ADS# 
two clocks after sampling BOFF# (and not, like all 
other pins, on the next clock). This is to ensure that 
ADS # is deasserted before it floats, and therefore is 
never left floating active. 


ADS# can be asserted while AHOLD is active to - 


initiate a cache write-back eee 


4.2.3 AHOLD (ADDRESS HOLD) 


The external system asserts AHOLD to perform a 


cache inquiry. In response to assertion of AHOLD, 


In. write cycles (noncacheable writes as well as 
cache line write-backs), the BEn# signals determine 
which bytes must be written into external memory 
for the current cycle. | 


In read cycles,-the BEn# values indicate which byte 
the load instruction has requested. In all noncachea- 
ble read cycles (CACHE# or KEN# deasserted), | 
the byte enables match the length and address of 


the requested data. Cacheable read cycles (KEN # 


the i860 XP microprocessor immediately (in the next — 


clock) stops driving the address bus (A31—As3 lines). 
The other buses remain active, and data can be 
transferred for previously issued read or write bus 


cycles during address hold. AHOLD is recognized | 
even. during RESET and LOCK#. The earliest that - 


AHOLD can be deasserted is the clock after ee 
is asserted to start the inquiry. | 


lf HITM# has activated due to an inquiry, the 
i860 XP microprocessor asserts ADS# while 
AHOLD is active to start the write-back of the modi- 
fied line that was the target of the inquiry. 


4.2.4 BE7#-BE0# (BYTE ENABLES). 
The byte-enable pins are driven with the address. 


BE7# applies to D63-D56, BEO# applies to D7- 
DO. 


asserted), however, result in four 64-bit memory 
transfers to fill an entire 32-byte cache line. The 
BEn# pins activated are those that represent the 
operand of the load instruction that caused the line 
fill, and these same BEn# pins remain activated for . 


as long as A31—AS. All 64 bits must be returned for 


each cacheable cycle without regard for the BEn# 
signals. 


While in CS8 mode, BE2#—BE0O# serve as (active- 
high) lower-order address bits for instruction fetches 
(from the ROM). Data fetches and stores are not 
affected by CS8 mode,.and BE2#-BE0# retain 
their normal byte-enable function for data. 


4.2.5 BERR (BUS ERROR) . 


This is a nonmaskable interrupt input, which sup- 
ports bus error handling or other urgent circum- 


stances. BERR is not masked by the IM bit of the 


psr nor by lock cycles. When BERR is activated, the 
i860 XP microprocessor vectors to the trap handler 


and sets the bus error flag (BEF) in the epsr. BERR 


causes the physical address of the current bus cycle 
to be latched into the BEAR control register; thus, if 
asserted the clock of BRDY# or the clock after 
BRDY #, it causes the bus address to be latched for 


. software to examine. BERR is rising-edge sensitive. 


Once the trap has occurred, further BEF traps can- 
not occur until software has cleared BEF and read 
BEAR. — 


BERR does not terminate outstanding bus cycles. 
Therefore, the system must still activate BRDY# a 
sufficient number of times or activate. BOFF# for 
those cycles. Even though activating BOFF # tem- 
porarily halts the erring cycles, the i860 XP micro- — 
processor will retry them when BOFF # is deassert- _ 


| ed, in spite of BERR. 


Timing of BERR is not influenced by late back:e 


mode. 


4.2.6 BOFF# (BACK-OFF) 


The system can assert this. signal to abort: all out- 
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standing bus cycles that have not yet completed. In 
response to BOFF#, the i860 XP microprocessor 
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immediately (in the next clock) floats its bus, except 
for ADS#, which is floated one clock later. The 
processor floats all the same pins normally floated 
during bus hold; however, unlike a bus hold, HLDA is 
not asserted. (HLDA is asserted only in response to 
HOLD; no acknowledgment is required for BOFF #.). 
Any data and BRDY# returned to the processor 
while BOFF # is asserted are ignored. The proces- 
sor remains in bus hold until BOFF# is deasserted, 
at which time it restarts the bus cycles by driving the 
address and cycle definition pins and asserting 
ADS#. When BOFF# deactivates, ADS# may be 
asserted the following clock. Thus a BOFF# dura- 
tion of one clock results in not floating ADS# at all. 
BOFF# cannot be used to force the pins to float 
during RESET; use HOLD for that purpose. 


4.2.7 BRDY# (BURST READY) 


The input BRDY # indicates either that the external 
system has driven valid data on the data pins in re- 
sponse to a read request or that the external system 
has latched the data in response to a write request. 
The CPU ignores this signal when no bus requests 
are outstanding. During a bus cycle, BRDY # is sam- 
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pled at each clock, starting with the clock after as-. 


sertion of ADS# and continuing until all data for the 
cycle has been transferred. When BRDY # is sam- 
pled active in a read cycle, the data present on the 
‘pins is sampled. 


4.2.8 BREQ (BUS REQUEST) 


BREQ allows the i860 XP microprocessor to share © 


the local bus with other bus masters. An external 
bus arbiter can use BREQ to implement an “on de- 
mand only” policy for granting the bus to the i860 XP 
microprocessor. The i860 XP microprocessor as- 
serts BREQ the clock after it realizes an internal re- 
quest for the bus. The system should sample this pin 
only when the i860 XP microprocessor is not in con- 
trol of the bus (that is, when HLDA, BOFF#, or 
AHOLD is active). BREQ is undefined when the 
{860 XP microprocessor is driving the bus. BREQ 
may be deasserted between assertions of ADS#, 
but this does not imply that the CPU does not need 
the bus. 
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4.2.9 BYPASS# (BYPASS) 


This pin is reserved by Intel Corporation and should 
be tied HIGH to Vcc through a resistor. When LOW, 
the phase-locked loop that generates the internal 
clock is unused. In this case, the internal clock has 
more skew relative to the external CLK, and the A.C. 
timing parameters are not guaranteed. 


4.2.10 CACHE# (CACHEABILITY) 


This output signal indicates internal cacheability of a 
bus request. Its timing follows that of the address 
bus. 


The i860 XP microprocessor asserts CACHE # for 


cacheable reads and code fetches to announce its [aam 
intention to cache the data. If CACHE# is asserted fava 
on a read cycle and if the KEN # input is active, the 


cycle is a burst line fill. If CACHE# is inactive in a 
read cycle, the i860 XP microprocessor does not 
cache the returned data, regardless of the KEN# 
pin. CACHE # is also asserted for cache line write- 
backs. 


CACHE # is inactive for noncacheable reads (for ex- 
ample, pfld, Idio, Idint), TLB replacements, and 
store misses. 


Table 4.4 shows how cacheability determines the 
number of data transfers in a cycle. 


Note that the CACHE # output is always inactive for 
CS8 (Code-Size 8 bits) mode instruction fetches so 
that the instructions are fetched with single-transfer 
cycles. However, the code fetched may then be 
placed in the instruction cache, unless KEN# was 
Mache: 


4.2.11 CLIK (CLOCK) 


The CLK input determines execution rate and timing 
of the i860 XP microprocessor. External timing pa- 


‘rameters are specified relative to the rising edge of 


this signal. The i860 XP microprocessor can utilize a 
clock rate of 50 Mhz. The internal operating frequen- 
cy is the same as the external clock. This signal re- 
quires TTL levels. 


ntl 


4.2.12 CTYP (CYCLE TYPE) 


CTYP is one of the bus cycle definition signals. Ta- 
bles 4.2 and 4.3 show the types of bus cycle gener- 
ated. CTYP is defined only for data write and read 
requests. The value of this Ep changes ony when 
ADS # is asserted. 


4.2.13 D/C# (DATA/CODE) 


D/C# specifies whether the current rsaueet is for 
data or instructions. The data/code line is one of the 
bus cycle definition pins. Tables 4.2 and 4.3 show 
the types of bus cycle generated. The value of this 
pin changes only when ADS # is asserted. 


4.2.14 D63-D0 (DATA PINS) 


The bus interface has 64 bidirectional data pins 
(D63-D0) to transfer data.in eight- to 64-bit quanti- 
ties. Pins D7—D0 transfer the least significant byte; 
pins D63-D56 transfer the most significant byte. In 
read cycles, all 64 bits of the data bus are latched, 
even in CS8-mode instruction fetches when only the 
low-order eight bits are used. In write cycles, the 
i860 XP microprocessor does not drive D63-D0 in 
the clock of ADS#, but in the following clock. 


4.2.15 DP7-DPO (DATA PARITY) 
There is one parity signal for each byte of the data 


‘bus. They are driven by the i860 XP microprocessor 
with even parity information on writes with the same 
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Table 4.5. EADS # Sample Time 


| Trigger | -—-—« EADS # First Sampled 


Second clock after AHOLD asserted 


| First clock after HLDA asserted © 
| Second clock after BOFF # asserted 


INV and FLINE# are sampled in the same clock pe- 
riod that EADS# is validly asserted. HIT#- and 


. HITM# may be deselied as the results of a cache 


SUI 


4.2.17 EWBE# (EXTERNAL WRITE BUFFER 
EMPTY) | 


At RESET, the value on EWBE# determines the or- 


_ dering mode. The processor enters strong ordering 


timing as write data. Likewise, if parity checking is — 


enabled by PEN #, the system must drive even pari- 
ty information on these pins with the same timing as 
read information to ensure that the correct parity 
check status is indicated by the i860 XP microproc- 
essor. “Even parity’ means that the total number of 
set bits in a byte, including the parity bit, is even. 
Refer also to the PCHK # signal. | 


4.2.16 EADS# (EXTERNAL ADDRESS STATUS) 


This signal indicates that a valid external address 
has been driven. onto address pins A31-A5 of the 
i860 XP microprocessor to be used for a cache in- 
quiry. This signal is recognized while the processor 
is in hold (HLDA is driven active), while forced off the 
bus with BOFF # input, or while AHOLD is asserted. 
The i860 XP microprocessor ignores EADS# at all 
other times. EADS # is not recognized if HITM# is 
active, nor during the clock after ADS#, nor during 
the clock after a valid assertion of EADS#. Table 
4.5 shows when EADS is first sampled. It is then 
sampled in every clock as long as the hold remains 
active and HITM# remains inactive. : 


mode if EWBE# is sampled active for at least the 
last. three clocks before RESET deactivates; other- 
wise, it enters weak ordering mode. 


In weak ordering mode, the value of EWBE# after 
reset does not affect processor operation. : 


In strong ordering mode, the external system asserts 
EWBE# as long as. all external write buffers are 
empty. If an external write buffer is not empty 


(EWBE# deasserted) or the internal write buffer is 


not empty, the processor delays data cache updates 
so as to keep the external order of writes the same 
as the programmed Graer. 


In systems that do not have exdernal write buffers, 
EWBE # can be tied to Vss; if strong ordering is de- 
sired, or to Vcc, if weak ordering i is acceptable. Re- 


fer to sections 5.3.3 and 5.3.4 for more explanation 


and for other ways to control \ write ordering. 


4.2.18 FLINE# (FLUSH LINE) 


The system asserts FLINE to feaueet that the 
i860 XP microprocessor write back a modified cache 


line before other outstanding bus cycles are com- 


pleted, if the line is hit by an external inquiry. If this 
pin is:active in the same clock that EADS # is assert- 
ed, the write-back cycle is initiated, and the i860 XP 
microprocessor expects BRDY #s for the write-back 
before outstanding cycles (if any) are returned. If 
data transfer for another cycle is currently in prog- 
ress when FLINE # is asserted (i.e. first BRDY # re- 
turned before HITM# asserted), the i860 XP micro- 
processor waits until the data transfers for that burst 
have completed, and only then does it assert the 
ADS # for the write-back. If the first BRDY # has not 
yet occurred for an outstanding cycle, NA# must be 


activated to trigger ADS# for the write-back. 
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At RESET, the value on FLINE# determines config- 
uration. The processor enters one-clock late back- 
off mode if FLINE # is sampled active for at least the 
last three clocks before RESET deactivates. 


4.2.19 HIT# (CACHE INQUIRY HIT) 


This pin is one output of inquiry cycles. If an inquiry 
cycle hits a valid line in the caches of the i860 XP 
microprocessor (either data or instruction), HIT # is 
asserted two clocks after EADS # is activated. If the 
inquiry cycle misses the caches, this pin is negated 
two clocks after EADS# activation. 


This pin changes its value only as a result of EADS # 
activation during AHOLD, HOLD, or BOFF # and re- 
tains its value until two clocks after the next valid 
activation of EADS#. 


HIT# can be used to control the WB/WT# pin of 
other processors in a multiprocessor system. Activa- 
tion of HIT # indicates that the inquiring processors 
should cache the line as S-state, not E-state. 


4.2.20 HITM# (HIT MODIFIED LINE) 


This pin is an output of inquiry cycles. When an in- 
quiry hits a modified line in the internal data cache, 
the i860 XP microprocessor asserts HITM# two 
clocks after EADS# is activated. (Refer also to the 
EADS# signal.) The HITM# signal stays active until 
the last BRDY# for the corresponding write-back 
cycle. At all other times, HITM# is inactive. HIT# is 
also asserted when HITM# is asserted (except for 
the special case of an inquiry after the ADS# of a 
write-back). | 


4.2.21 HLDA (BUS HOLD ACKNOWLEDGE) 


The i860 XP microprocessor activates HLDA in re- 
sponse to a hold request presented on the HOLD 
pin. Assertion of HLDA indicates that the i860 XP 
microprocessor has given the bus to another local 
bus master. It is driven active in the same clock that 
the i860 XP microprocessor floats its bus. All output 
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pins are floated except LOCK#, BREQ, HLDA, 


PCHK#, HIT#, and HITM#. 


The time required to acknowledge a hold request is 
one clock plus the number of clocks needed to finish 
any outstanding bus cycles (maximum of four out- 
standing cycles of four burst transfers each for total 
of 16 transfers). If this hold latency is too long for a 
given application, BOFF # can be used instead. 


_ When leaving a bus hold, the i860 XP microproces- 
- sor deactivates HLDA and, in the same clock period, 
initiates a pending bus cycle, if any. 
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4.2.22 HOLD (BUS HOLD) 


This pin, along with the output signal HLDA, is used 
for local bus arbitration. At some time after the 
HOLD signal is asserted, the i860 XP microproces- 
sor releases control of the local bus and puts most 
bus interface outputs in floating state, then asserts 
HLDA—all during the same clock period. It main- 
tains this state until HOLD is deasserted. Instruction 
execution stops only if required instructions or data 
cannot be read from the on-chip instruction and data 
caches. The i860 XP microprocessor ignores HOLD 
until all outstanding bus cycles are complete (until 
the last BRDY #). The i860 XP microprocessor rec- 
ognizes HOLD even during RESET and LOCK#. 
HOLD cannot be used when the 82495XP cache 
controller is attached. 


4.2.23 INV (INVALIDATE) 


The external system asserts this signal to invalidate 
the cache-line state in the case of an inquiry cycle 
hit. It is sampled together with A31—A5 in the clock 
EADS # is active. ; 


4.2.24 INT/CS8 (INTERRUPT/CODE-SIZE 
EIGHT BITS) 


This input, like the BERR input, allows interruption of 
the current instruction stream. The processor sam- 
ples INT as instruction boundaries. If interrupts are 
enabled (IM set in psr) when INT is sampled active, 
the i860 XP microprocessor fetches the next instruc- 
tion from virtual address OxFFFFFFOO. INT is level 
triggered. To assure that an interrupt is recognized, 
INT should remain asserted until the software ac- 
knowledges the interrupt (by executing an interrupt- 
acknowledge cycle, for example). The interrupt may 
be ignored by the processor if the INT signal does 
not remain active. 


Interrupt latency (the maximum time between asser- 
tion of INT and execution of the first instruction of 
the trap handler) depends both on the internal con- 
text and on the external system. After INT is assert- 
ed, the i860 XP microprocessor finishes all instruc- 
tions currently being executed, including any out- 
standing bus cycles, before starting the trap handler. 
The following instruction sequence is an example of 
the worst case: 


pfld.q 
pfld.g . 
1X: 
br 
ld.l 
st.l 


If INT is asserted during the execution stage of the 
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last Id.I instruction, the execution of the trap handler 


may have to wait for: . 
¢ Two 2-transfer bursts (the pfld instructions) 


Two data cache line fills (misses by the Id.l 
instructions) 


Two data cache line write-backs (eliminating 
. modified lines to open space for the fills) 


Two instruction cache line fills (the target of the 
_ br and the first instruction of the trap handler) 


Three TLB miss sequences of up to six nonpipe- 
lined accesses each (the br, the last Id.l, and the 
trap handler) 


The time to finish the above bus activities can be 
extended by inquiry cycles and associated write- 
backs initiated by an external cache or bus control- 
ler. | 


Besides the bus-related delays, the i860 XP micro- 
processor has internal freeze conditions that can de- 
lay interrupt response by up to 10 additional clocks. 


During a locked sequence, the INT pin is ignored, 
and the INT bit of epsr reflects the value on the INT 
pin. To limit the time that INT is ignored, the lock 
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microprocessor generates a read cycle that can be 
cached (CACHE # output active) and KEN# is ac- 
tive, the cycle is transformed into a burst line fill. By 
activating KEN#, the memory system commits to a 
four-transfer burst. The entire 64 bits of the data bus 
are used for the read, regardless of Me state of the 
byte-enable pins. 


If KEN# is sampled inactive, code fetches are not 


instruction can assert LOCK# for am 30- a3 in- 


structions before trapping. 


This input is asynchronous, but apptopiais: setia 


and hold times must be met to insure eroonnen on 


any specific clock. | 


lf INT is asserted for at least the last three clock 
periods. before the falling edge of RESET, the 
i860 XP microprocessor enters eight-bit code-size 
(CS8) mode. 


4.2.25 KBO, KB1 (CACHE BLOCK) 


For reads, these output signals define which cache 


block (line) is going to receive the data. For write- 


backs, these lines specify which block is being 
flushed. They are driven together with cycle defini- 
tion for cacheable data reads, TLB replacement, 
code fetch cycles, and write-backs. External hard- 
ware can use these signals to observe changes to 
cache blocks. 


4.2.26 KEN# (CACHE ENABLE) 


The i860 XP microprocessor samples KEN # to de- | 


termine whether the data being read for the current 
cache-miss cycle is to be cached. When the i860 XP 
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transferred in bursts, but 128-bit data items may still 
be transferred with a burst length of two. 


KEN# is sampled together with NA# or BRDY#, 
whichever comes first. It is sampled only with the 
first BRDY # of a burst; its value at any other time» 
has no effect. | | 


4.2.27 LEN (DATA LENGTH) 


The LEN output pin specifies the number. of burst 
transfers for each cycle. This pin and the CACHE # 
output pin are used by the system to determine the 
burst length for each cycle (refer to Table 4.4). The 
i860 XP microprocessor can generate 1, 2, or 4- 
transfer bursts for reads and writes. 


LEN is inactive if the internal request is for 64 bits or 
less. If LEN is active, the internal request is for 128 
bits or more, and the cycle should be returned as a - 
two- or four-transfer burst. LEN'is always active for 
128-bit data accesses. LEN is away inactive. for | 
code accesses. _ 


A cacheable read (CACHE # active) can be auto- 
matically converted to a four-transfer burst regard- 
less of LEN by assertion of KEN#. © 


Table 4.4 summarizes different cycle lengths as they 


are calculated from the LEN and CACHE # signals. 
LEN has the same timing as the address. | 


4.2.28 LOCK# (ADDRESS LOCK) 


This signal is used to provide atomic (indivisible) 


read-modify-write sequences in multiprocessor sys- 
tems. The address to be locked is the one being 
driven on A31-—A3 when LOCK # is activated. A mul- 
tiprocessor bus arbiter must permit only one proces- 
sor a locked read, locked write, or unlocked write to 
that address and must maintain the lock of that loca- 
tion across cycle boundaries until LOCK# deacti- 
vates. The simplest arbitration hardware can just 
lock the entire bus against all other accesses during 
LOCK# assertion; however, software must never 
assume that this implementation is being used. 
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The i860 XP microprocessor coordinates the exter- 
nal LOCK# signal with the lock and unlock 
instructions. Programmers do not have to be con- 
cerned about the fact that bus activity is not always 
synchronous with instruction execution. LOCK # is 
asserted with ADS # for the address operand of the 
first load or store instruction executed after the lock 
instruction. | 


After an unlock instruction, LOCK# is deasserted 
with the next load or store. The i860 XP microproc- 
essor deactivates LOCK # one clock after ADS# for 
the last locked bus cycle. Unlike the i860 XR micro- 
processor, the i860 XP microprocessor does not 
deassert LOCK # immediately when a trap occurs. 
Instead, the trap handler must execute a load or 
store instruction to deassert LOCK #. (The handler 
does not have to execute an unlock instruction, 
however. The unlocking function is performed by the 
processor’s trap logic.) 


The i860 XP microprocessor also asserts LOCK # 
during TLB miss processing for updates of the ac- 
cessed bit in page-directory and page-table entries. 
The maximum time that LOCK # can be asserted in 
this case is the time required to perform a nonpipe- 
lined, four-byte, read-modify-write sequence. 


Between locked sequences, at least one cycle of no 
LOCK # is guaranteed by the behavior of the unlock 
instruction. 


Between lock and unlock instructions, the INT pin is 
ignored. 


Instruction fetches do not alter the LOCK # signal. 


4.2.29 M/lO# (MEMORY-I/0O) 


M/lO# specifies whether the current cycle is for the 
memory address space or for the I/O address 
_ space. M/lO # is one of the bus cycle definition pins. 
Tables 4.2 and 4.3 show the types of bus cycle gen- 
erated. The value of this pin changes only when 
ADS # is asserted. 


4.2.30 NA# (NEXT ADDRESS REQUEST) 


NA# makes address pipelining possible. The sys- 
tem asserts NA# for at least one clock to indicate 
that it-is ready to accept the next address from the 
i860 XP microprocessor. (If the system does not im- 
plement pipelining, NA# must not be activated.) The 
i860 XP microprocessor samples NA# every clock, 
starting one clock after the activation of ADS#. If 
the i860 XP microprocessor has a new cycle pend- 
ing internally when NA# is activated, it initiates that 
cycle in the clock after NA# is asserted. Up to three 
bus cycles can be outstanding simultaneously. 


NA# is latched internally; the i860 XP microproces- 
sor remembers that NA# was asserted until it has 
an internal request to send to the bus; so, assertion 
of NA# for a single clock can trigger an ADS# sev- 
eral clocks later. NA# is ignored in the clock of 
ADS #. 


KEN# and WB/WT# inputs for the current cycle 
are sampled with NA#, if NA# is asserted before 
the first BRDY # of the current cycle. 


NA# is also used in conjunction with FLINE# to 
invoke write-back of a modified line during outstand- 
ing bus cycles. | 


4.2.31 NENE# (NEXT NEAR) 


The i860 XP microprocessor asserts NENE# when [a] 
the current address is in the same DRAM page as las 
the previous bus cycle. This signal allows higher- 
speed reads and writes in the case of Consecutive 
accesses to static column or page-mode DRAMs. 
The i860 XP microprocessor determines the DRAM 
page size by inspecting the software-controlled DPS 
field in the dirbase register. The page size can 
range from 29 to 216 64-bit words, supporting DRAM 
sizes from 256K xX 1 to 4G X n. The value of this 
pin changes only when ADS # is asserted. NENE# 
is never asserted for the next bus cycle after the 
address bus has been floating (after ROR 
BOFF#, or HLDA is deasserted). 


4.2.32 PCD (PAGE CACHE DISABLE) 
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PCD provides a cacheability indication on a page by 
page basis. This signal, together with PWT, is set to 
an attribute bit in the page table entry for the current 
cycle. When paging is enabled, PCD corresponds to 
the CD bit (bit 4) of the page table entry. The i860 XP 
microprocessor does not perform a cache fill to any 
page for which CD of the page table entry is set. 
When paging is disabled, or for any cycle that is not 
paged (Idio, stio, Idint, scyc), the i860 XP _micro- 
processor drives PCD inactive. 


During TLB miss processing, PCD is inactive while 
the address translation hardware is accessing the 
first level page directory. During accesses to the 
second-level page-table entry, PCD reflects the CD 
values taken from the first level page-table entry. 


The value of this pin changes only when ADS# is 
asserted. 
4.2.33 PCHK# (PARITY CHECK) 


This output shows the result of the parity check on 
data pins in the previous clock of a read cycle. It is 


intol. 


asserted for one clock when incorrect parity has 
been detected.. It reflects the pouty. status for the 
entire data bus. : 


PCHK# does not fenninats outstanding bus cycles, 
so the system must still activate BRDY# a sufficient 
number of times or activate BOFF# for those cy- 
cles. PCHK # is always inactive after any code fetch 
in CS8 mode. | 


4.2.34 PCYC (PAGE CYCLE) 


The page cycle line is active during memory read or 
write cycles to distinguish page-table accesses from 
other accesses. The types of bus cycle generated 
are indicated in Tables 4.2 and 4.3. The value of this 
pin changes only when ADS # is asserted. 


- 4.2.35 PEN# (PARITY ENABLE) 


The i860 XP microprocessor samples this signal for 
read cycles on the same clock edge at which 
BRDY# is found asserted. If sampled active, the 
i860 XP microprocessor feeds the parity check re- 


sult into the interrupt logic. If a parity error is encoun-. 


tered, the i860 XP microprocessor vectors to the 
trap handler. The BEAR register latches the offend- 
ing address, as described with the BERR signal. 
This interrupt is not masked by the IM bit of the PSR, 
nor is it masked during lock cycles. 


The system should deassert PEN# any time the 
DP7-DP0 pins are known not to reflect the parity of 
the full eight-byte bus (for example, reads from I/O 
devices or ROMs that are not parity protected). 


The system should deassert PEN# during code 
fetches in CS8 mode. 


At RESET, the value of PEN# determines the out- 
put buffers configuration for ADS#, A21-A3, 
BE7 #-BEO#, W/R#, HITM#. These pins are con- 
figured as normal (small output buffers) mode if 
PEN# is sampled active for at least the last three 
clocks before RESET deactivates. Otherwise, these 
pins are configured as high-current mode (large out- 
put buffers). 


4.2.36 PWT (PAGE WRITE-THROUGH) 
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first level page directory. During accesses to the 
second-level page-table entry, PWT reflects the WT 
value taken from the first level page-table entry. . 


The value of this pin changes only when ADS # is 
asserted. 


4.2.37 RESET (SYSTEM RESET) 


Asserting RESET for at least ten CLK periods caus- 
es initialization of the i860 XP microprocessor. On 
power up, RESET should remain active at least one 
millisecond after Vcc and CLK have reached their 
proper DC and AC specs. RESET is synchronous 
with CLK. 


After the RESET signal goes inactive the processor 
remains in the-RESET state for three more clocks. 
Applications that use the HOLD signal to float the 
bus during RESET should keep HOLD active for 
three more clocks after the RESET signal is deacti- 
vated. 


4.2.38 RSRVD, SPARE 

The RSRVD input is reserved by Intel iGeiporstion 
and must be tied HIGH to Vcc through a resistor 
(5 KQ). The spare input should be left unconnected. 
4.2.39 TCK (TEST CLOCK) 

This is the clock input for the TAP (test access port). 


_ If the TAP is to be used, this signal must be connect- 


ed to a clock synchronous to CLK. If the TAP is not 
used, TCK can be tied low. TCK does not need to be 
kept running when boundary scan is not active. 


The rising edge of TCK must be externally eure: 
nized to CLK. The boundary scan latches retain their 
state when TCK is stopped at either logic zero or 
one. 


4.2.40 TDI (TEST DATA INPUT) 


TDI is the input for test instructions and data to the 


TAP. TDI is sampled on the rising edge of TCK. It is 


PWT provides a write-back/write-through indication | 


on a page by page basis. .This signal, together with 
PCD, is set to an attribute bit in the page table entry 
for the current cycle. When paging is enabled, PWT 
corresponds to the WT bit (bit 3), and write-back 
caching is implemented for this page only if WT is 
clear. When paging is disabled, or for any cycle that 
is not paged (Idio, stio, Idint, scyc), the i860 XP 
microprocessor drives PWT inactive. 


During TLB miss processing, PWT is inactive while 
the address translation hardware is accessing the 
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provided with an internal pull-up resistor, so that an 
open circuit at TDI produces a result equivalent to | 
driving continuous HIGH signals. | 


4.2.41 TDO (TEST DATA OUTPUT) 


This is the serial output of the TAP. The contents of . 
TAP registers are shifted out through TDO on the 
falling edge of TCK. The data is moved from TDI to 
TDO without inversion, which allows easy serial cas- 
cading of different components for scanning. | 


TDO is held in high-impedance state, except while 
scanning is in progress. This allows parallel connec- 
tion of these outputs for several components. 

. \ F 
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4.2.42 TMS (TEST MODE SELECT) 


This input is decoded by the TAP to select the oper- 
ation of the TAP. It is sampled at the rising edge of 
TCK. It is provided with an internal pull-up resistor to 
assure deterministic behavior for open-circuit failure 
at this pin. If boundary scan is not used, TMS can be 
tied high or left unconnected. : 


4.2.43 TRST# (TEST RESET) 


This input resets the TAP. If the TAP is not used, 
TRST # should be tied LOW. To ensure determinist- 
ic behavior of the test logic, TMS should be held 
HIGH while TRST # changes from LOW to HIGH. 


4.2.44 Voc (SYSTEM POWER) AND Vss 
(GROUND) 


The i860 XP microprocessor has 54 pins for power 
and 56 for ground. All pins must be connected to the 
appropriate low-inductance power and ground sig- 
nals in the system. 


4.2.45 VocCLK (CLOCK POWER) 
This is the power supply for the internal CLK buffer. 


It should be connected to the same Vcc plane as 
‘the other Vcc pins. _ 


4.2.46 WB/WT# (WRITE-BACK/WRITE- 
THROUGH) 


This input signal defines cache policy for the line 


being accessed in the current bus cycle. The proc- : 


essor samples WB/WT # for both reads and writes 
-on the same clock edge at which it finds NA# or the 
first BRDY # asserted, whichever comes first. If this 
signal is sampled low, the write-through policy is ap- 


SIGNAL ID 


NOTES: 

1. HIGH (high voltage) 

2. Don’t care or undefined 
3. LOW (low voltage) 

4. High-impedance (floating) 
5. Either HIGH or LOW 
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plied to the cache line—if an internal write hits this 
line, it causes a write-through cycle. If this signal is 
sampled high, the write-back policy is applied—fu- 
ture write hits to this line do not show up on the bus. 


4.2.47 W/R# (WRITE/READ) 


This pin specifies whether a bus cycle is a read 
(LOW) or ‘write (HIGH) cycle. Tables 4.2 and 4.3 
show the types of bus cycle generated. The value of 
this pin changes only when ADS # is asserted. 


5.0 BUS OPERATION 


The interaction among signals is illustrated by timing 
diagrams. Figure 5.1 shows the conventions used in 
the timing diagrams. 


5.1 Bus Cycles 


A bus cycle begins when the i860 XP microproces- 
sor activates ADS # and ends when the system acti- 
vates the last of a predetermined number of BRDY # 
signals. Figure 4.4 shows how the i860 XP micro- 
processor and the external system cooperate to de- 


_termine the number of BRDY# activations in each 


cycle. The processor starts sampling BRDY# one 


' clock after assertion of ADS# and continues sam- 


pling in every clock until the last BRDY # becomes 
active. — 


The i860 XP microprocessor supports several differ- 
ent types of bus cycle. These are introduced in order 
of complexity: ; 


1. Single-transfer cycles 


2. Multiple-transfer (burst) cycles 
3. Pipelined cycles 


_ 4. Cache inquiry cycles 


240874-28 


Figure 5.1. Timing Diagram Conventions 
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5.1.1 SINGLE-TRANSFER CYCLE — 


The simplest bus cycle is the single-transfer, non- 
cacheable, 64-bit cycle either with or without wait 
states. The shortest bus cycle is two clock. periods 
long. Read and write cycles of this type are shown in 
_ Figure 5.2. | 


A wait state is any clock in which the i860 XP micro- 
processor samples BRDY # but the system does not 


assert it. The system can add wait states to any cy-: 


cle. Figure 5.3 shows cycles with two wait states 
added. Any number of wait states can be added to 
i860 XP microprocessor bus cycles by maintaining 
BRDY # inactive. | 


ADDRESS 
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5.1.2 BURST CYCLES 


When a bus request requires more than a single 
data transfer (refer to Table 4.4), the i860 XP micro- 
processor requires that the memory system perform 
a burst data transfer. Burst cycles allow the maxi- 
mum bus transfer rate by eliminating unnecessary — 
driving of the address bus. The addresses of the 


data items in burst cycles all fall within the same 32- 


byte aligned area (corresponding to an _ internal 
i860 XP microprocessor cache line). Given the ad- 
dress of the first transfer, external hardware can cal- 
culate the-addresses of subsequent transfers. With 
these addresses eliminated from the bus, a new 
data item can be sampled into the i860 XP micro- 

processor every clock period. 


\ on ane! : oe a! ; 
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Figure 5.2. Fastest Single-Transfer Cycles 
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Figure 5.3. Single-Transfer Cycles with Wait States 


The fastest possible burst cycle requires two clock 
periods for the first data item: one clock for ADS# 
and one clock for BRDY #; subsequent data items 
are transferred every clock period. One such bus 
cycle is shown in Figure 5.4. Note that, in this case, 
_ the initial cycle generated by the i860 XP microproc- 
essor could be satisfied by a single data transfer, but 
the system transforms it into a multiple-transfer 
cache line fill by activating KEN # in the clock period 
of the first BRDY #. KEN# has this effect only if the 
CACHE # pin is active, which means the cycle is in- 
ternally cacheable in the i860 XP microprocessor. 
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Read data is sampled only in the clock period in | 


which BRDY # is returned, which means that data 


need not be sent to the i860 XP microprocessor ev- 
ery clock period in the burst cycle. Figure 5.5 shows 
an example of a burst cycle in which two clock peri- 
ods are required for every burst item. 


The burst length attributes LEN and CACHE# are 
driven with the address. Figure 5.6 illustrates two 
consecutive burst cycles with differing length attri- 
butes: the first one is a noncacheable 128-bit read, 
and the second one is a cache line fill initiated by a 
cacheable 64-bit read. 
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NOTE: a 
| 1. KEN# driven with first assertion of BRDY # 


Figure 5.4. Basic Burst Cycle 
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NOTE: 
1. Wait states added by delaying assertion of BRDY # 


Figure 5.5. Slow Burst Cycle 
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Figure 5.6. Different Lengths of Burst Cycles 


The timing of write bursts is similar to that of read 
bursts. The i860 XP microprocessor does not put 
data on D63-—D0 for writes until the clock period af- 
ter ADS #. | 


When initiating any read, the i860 XP microproces- 
sor presents the address for the data item request- 


ed. When the cycle is converted into a cache fill, the 


first data item returned corresponds to the address 
sent out by the i860 XP microprocessor. The remain- 
ing items must be returned in the order shown in 
Table 5.1. This ordering is optimized for two-bank 
memories, but works equally well with noninter- 
leaved memories. 


In i860 XP microprocessor systems, memory must 
support the burst order as defined in Table 5.1 for 
reads. For writes, the burst addresses are always 
increasing, so writes with four transfers match the 
first line of the table. In CS8 (code-size 8 bits) mode, 
instructions are not fetched in bursts. 


Note that the i860 XP microprocessor drives only 
the first address of a burst cycle; the memory sys- 
tem is responsible for calculating subsequent ad- 
dresses as shown in the table. The addresses can 
. be derived by complementing A3 after every trans- 
fer, and complementing A4 after two transfers. 


Table 5.1. Burst Order for Cache Line Transfers 


ist 2nd | 3rd 4th 
Address Address | Address Address 


0 8 0x10 0x18 

8 ) 0x18 0x10 
0x10 0x18 0 8 
0x18 0x10 8 Oo: * 


5.1.3 PIPELINED CYCLES 


A pipelined cycle is one that starts while one or two 
other bus cycles are outstanding. A cycle is consid- 
ered outstanding until the last BRDY # is asserted to 
terminate that cycle. A nonpipelined cycle is one 
that starts when no other bus cycles are outstand- 
ing. Both types of cycle can be either read or write . 
cycles. To allow high transfer rates in large memory 
systems, the i860 XP microprocessor supports two- 
level pipelining. New cycles can start as often as 
every other clock until three cycles are outstanding. 


The system asserts NA# to indicate that the 
i860 XP microprocessor can start another cycle be- 
fore the current one is completed. (NA# can even 
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be asserted while BRDY # is active.) The i860 XP © Reads canbe pipelined into TLB miss writes. TLB | 


microprocessor begins sampling NA# in the next misses for instructions can be pipelined into data 
clock after ADS # is asserted. If the following condi- accesses, and vice versa. 

tions are met, a new (pipelined) cycle begins: e No data cycle is ever pipelined while LOCK# is 
1. NA# having been active | - active. : | : 

2. An internal request pending | e |/O cycles, special cycles, and Idint cycles never 
3. Compatibility between the pending request and © _—~*Pegin when any cycle is outstanding. 


the outstanding requests (refer to Table 5.2) : 
NA# may be asserted before, simultaneously with, 
4. HOLD, BOFF #, and AHOLD not active or after the first BRDY # of the current cycle. If NA# 
. 5, Fewer than three cycles outstanding cap is asserted before the first BRDY #, the cacheability 
(KEN #) and cache policy (WB/WT #) indicators for 
The following “compatibility” rules determine when the current cycle are sampled during the same clock 
the processor does not issue a pipelined ADS# period as NA# is sampled active; otherwise, they 
(they are the source of Table 5.2): are sampled with the first BRDY #. Figure 5.7 shows 
° Data cache line fills are pipelined into each other 4M example of four-transfer, pipelined, back-to-back 
only in the case ofan aliasing virtual tag miss with reads. Note the timing of KEN#. Because NA# is 
a physical tag hit. ‘ | asserted before the first BRDY# of the cycle A, | 
KEN# is sampled with the NA# for cycle B. 


Table 5.2. Pipeline Cycle Compatibility 


If A is Outstanding, can B be Pipelined into It? 


Data 
Cache 
| Line Fill 


Data Cache 
Store Miss, 
Write-Thru 


Data Cache 
Read Miss 


Idio, stio, | LOCK # 
Idint, scyc | Active 


Data 
Cache 


I YES* 
~ Line Fill 


a 
m 
~” 


| Data-Cache 
Store Miss, 
Write-Thru 


Data Cache 
Read Miss 
KEN#=1 


Instruction YES . YES YES YES | YES YES 

Fetch | 
| pfld | YES | 
J seye | | . 

— idio _ | | | 
| ieee A ENO: NO . NO NO YES NO | YES 
7 idint ath | 

# 
Active 2 


NOTE: | soy - 
* Pipelining can occur if the first ADS# is for an aliasing virtual tag miss with a physical tag hit. 
**Inquiry write-backs are not pipelined into prior cycle unless FLINE# is asserted. 
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CACHE # 


NOTES: 

A Four-transfer, cache line fill cycle 

B Four-transfer, cache line fill cycle 

1. KEN# for A simultaneous with NA# 


Figure 5.7. Pipelined Cache Line Fills 


Write cycles can be pipelined into read cycles and 
vice versa, but, in both cases, the processor will 
leave one clock between bursts to allow bus turn- 
over, and will ignore any BRDY # given to it at that 
time. Pipelined back-to-back read and write cycles 
are shown in Figure 5.8. On writes, assertion of NA# 
does not cause the values on the data bus to 
change; it just enables new address and cycle speci- 
fication outputs. 


5.1.4 INTERRUPT ACKNOWLEDGE CYCLES 


In response to a trap caused by assertion of the INT 
pin, trap-handling software can generate. interrupt 
acknowledge cycles by executing a procedure simi- 
lar to the following. 


ldint.b sre2, rdest First INTA cycle. Sre2 contains 8. 
or rdest, r0, rdest // Won't proceed until rdest loaded. 


unlock // Unlock the bus after the next ldint 
//nop ' Insert 4 +’<number of NOPs> idle 
//nop clocks for 8259A recovery. 
ldint.b ro, rdest f/7/ Second INTA cycle 
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NOTES: | 

R Two-transfer, noncacheable read cycle 

W Two-transfer, noncacheable write cycle 

1. Idle clock for bus turnaround | 
2. Second assertion of NA# could be here 


Figure 5.8. Pipelined | Back-to-Back Read and Write Cycles . 


Figure 5.9 shows the interrupt acknowledge cycles internal conditions have occurred. The special bus 
generated by the code sequence. Interrupt acknowl- cycle (indicated by M/IO# = 0, D/C# = 0, and 
edge cycles are generated in locked pairs. The inter- W/R# = 1) is generated by the i860 XP microproc- 
_rupt vector is returned during the second cycle. Each essor aS a response to scyc instruction execution. 
of the interrupt acknowledge cycles is terminated This cycle (defined in Table 5.3) is used to flush or 


when the external system responds by asserting __ invalidate a secondary cache. The defined value of 
BRDY #. Wait states can be added by withholding byte enables can be generated by using an appropri- 
_. BRDY#. There must be a number of idle clocks be- ate address operand in the scyc instruction. The 
tween the first and second cycles to allow for 8259A scyc instruction does not have any effect on the 
recovery time. The software controls the number of internal caches. External hardware must acknowl- 
intervening clocks via the number of nop instruc- edge a special bus cycle by asserting BRDY # once. 


tions in the interrupt acknowledge routine. The data driven on the data bus with BRDY# is 
| | undefined. The effect of scyc is determined by de- 


in ext Ih : 
5.1.5 SPECIAL BUS CYCLES coders in ex ane ardware 


The i860 XP microprocessor provides a special cy- 
cle to indicate to the external system that certain 
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Figure 5.9. Example Interrupt Acknowledge Sequence 


Table 5.3. Encoding of Special Bus Cycles 


BE7 # -BEO# Special Bus Cycle ; 


11110111 


11111011 
11111101 


Halt 


Write Back Extomel Cache and Invalidate 


Invalidate External Cache 


11111110. 
All other encodings are reserved. 


5.2 Bus Arbitration 


The i860 XP microprocessor responds to three dif- 
ferent signals that tell it to stop driving the bus: 


HOLD Finishes outstanding eyces: before eaving 7 


up the bus. 


BOFF# Aborts outstanding cycles and gives up bus 
immediately. 


AHOLD Stops driving address bus and permits a 
cache Quy: 


AHOLD results in a partial hold state, which is cov- - 


ered in Section 5.3. The present section concen- 
trates on HOLD and BOFF#. 


When in a hold state (due either to HOLD or 


BOFF #), the i860 XP microprocessor uses BREQ to. 


request control of the bus. If halding due to HOLD, 
AHOLD, or BOFF #, the processor activates BREQ 
in the clock after an internal bus request is generat- 


—~ Shut Down 


ed. (In the case of HOLD, BREQ is asserted even 


though HLDA is asserted.) If holding due to BOFF # 
and cycles need to be restarted or there is a new 
internal request, it asserts the BREQ signal within 
four clock periods after the assertion of BOFF#. In 
all cases, BREQ remains active at least until the 
clock after ADS# is activated me the requested cy- 
cle. 


5.2.1 HOLD AND HLDA ARBITRATION 


HOLD indicates to the i860 XP microprocessor that 
another bus master needs control of the bus. When 


_.,HOLD is asserted, the i860 XP microprocessor 
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keeps control of the bus until all outstanding cycles 
are completed. Then it floats the output signals (ex- 
cept BREQ, HLDA, LOCK#, PCHK#, HIT#, and 


~ HITM#) and asserts HLDA. These outputs remain at 


the high-impedance state until HOLD is deasserted. 


ntel. 
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HLDA may be asserted as soon as the clock period | 
after the one in which HOLD is asserted. HLDA may: 


be deasserted as soon as the clock after the one in 
which HOLD is deasserted. - 


An example HOLD/HLDA transaction is shown in 
Figure 5.10. The i860 XP microprocessor recognizes 
HOLD even while RESET is asserted, and it drives 
HLDA in this case as well. 


' HOLD is recognized even when BOFF # is active, 
‘and the i860 XP microprocessor responds with 
HLDA the same as when the bus is idle. 


§.2.2 BUS CYCLE BACK-OFF AND RESTART 


The i860 XP microprocessor provides the ability to 
abort bus cycles and restart them again. It is neces- 
sary to abort cycles for reasons such as the follow- 


ing: 
4. Retry after an error is detected by ECC or parity 
logic. : | 


2. Escape from a deadlock; for example, when the 
i860 XP microprocessor is using A31—A3 to load 
a new cache line, but the 82495XP cache con- 
troller needs A31-A5 to invalidate a line in the 
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3. Maintain cache consistency; for example, the 
i860 XP microprocessor is attempting to read or 
write to a line that has been modified i in the cache 

_ of another CPU. . 


Prevent illegal access to an address already 
locked by another CPU in a multiprocessor sys- 
tem. 


iE 


5.2.2.1 Cycle Back-Off 


Bus cycles are aborted when the system asserts 


BOFF #. The i860 XP microprocessor samples this 


‘pin in every clock period that it is driving the bus. 


When BOFF # is asserted, the i860 XP microproces- 


~ gor immediately (in the next clock period) floats the 


CPU cache which the 82495XP cache controller. 
is replacing in its cache in oe lo) satisfy the 


_ CPU’s line-fill request. 


CACHE# © 


] 

] 

I. 
ee = 

| 

] 


bus. It floats the ADS# pin one clock period later, 
thereby giving time for ADS# to be deasserted so 
that it is not left floating active. The i860 XP micro- 
processor floats the same pins as for HOLD, but 
HLDA is not asserted. If a bus cycle is in progress at 
the time BOFF# is asserted, the cycle is aborted, 
and, in a read cycle, any data returned to the proc- 
essor while BOFF# is active is ignored. BOFF# 
overrides BRDY #; so, if both are sampled active in 
the same clock, BRDY # its ignored. BOFF# aborts 


' 
1 
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Figure 5.10. HOLD/HLDA Handshake 
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a burst cycle even if it arrives with the last BRDY # 
of the cycle. However, for read bursts, data transfers 
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completed before assertion of BOFF # are used by | 


the processor if they satisfy an internal request. 
Cacheable data is cached in spite of BOFF #; how- 
ever, the cached data is overwritten when the cycle 
is restarted. 


The bus remains in the high-impedance state until 
BOFF # is deasserted. If cycles need to be restarted 
or if a new internal request has been generated, the 
BREQ signal is asserted within four clock periods 
after the assertion of BOFF #. 


5.2.2.2 Cycle Restart 


When the system deasserts BOFF#, the i860 XP 
microprocessor restarts aborted bus cycles from the 
beginning by driving the address and status (A31- 
A838, W/R#, D/C#, etc.) and asserting ADS#. If 
more than one cycle was outstanding when BOFF # 
was asserted, the i860 XP microprocessor restarts 
all outstanding cycles in the same order. If HITM# is 
active due to an inquiry, the write-back for it will be 
the first cycle after deassertion of BOFF #. BOFF # 
restarts all aborted cycles except: 


° The stale cycles mentioned in section 5.3.5. 


o The read that may have been generated by an 
alias hit (virtual tag miss, but physical tag hit). 


© The read that may have been generated by a 


pfid that hit the data cache. 


If the processor’s KEN# pin was active (with NA# 
or first BRDY #) before the cycle was aborted, exter- 
nal hardware must activate it again after the cycle is 
restarted. In other words, the system cannot use 
BOFF# to change 
KEN#. | 


The LOCK# signal is not affected by restarted cy- 
cles; it retains its state in spite of BOFF# assertion. 


5.2.2.3 Late Back-Off Modes — 


In some cases the logic that needs to assert 
BOFF # cannot make the necessary decision in time 
to cancel the relevant cycle or data transfer. For ex- 
ample: | 

1. The result of checking ECC or parity may not be 


available until one or two cycles after the BRDY # 
to which it corresponds. 


to read or write to a line that might be modified in 
the cache of another processor on the same bus, 
it may be advantageous to let part of a burst run 


the cacheability of a cycle via 


. When the i860 XP microprocessor is attempting 
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in parallel with inquiries to the other processors, 
‘rather than delay the entire burst until the inquir- 
_ ies are finished. 


For such situations, the i860 XP microprocessor pro- 
vides /ate back-off mode. For a read cycle in this 
mode, the processor employs a buffer to internally 
delay data and BRDY#, which allows BOFF¥# as- 
sertion to be delayed relative to the external 
BRDY #. Likewise, for a write cycle in this mode, 
BOFF# assertion can be delayed relative to 
BRDY #. However, data for a write cycle is not de- 
layed. | 


Two flavors of late back-off mode are provided: . 


1. One allows BOFF # to be delayed by one clock 
period relative to the data transfer. The proces- 
sor enters one-clock late back-off mode when 
the FLINE# pin has been sampled active for at 


least three clock periods when RESET deacti- [i 


vates. 


. The other allows BOFF # to be delayed by up to 
two clock periods relative to the data transfer. 
The i860 XP microprocessor enters this mode 
when software sets the LB bit of the dirbase 
register. 


lf the processor enters one-clock late back-off mode 
during RESET, it is impossible to enter two-clock 
late back-off mode. The LB bit has no effect. Fur- 
thermore, software cannot exit two-clock late back- 
off mode once it is activated, and the LB bit cannot 
be cleared except by resetting the processor. 


Figures 5.12-5.17 illustrate variations on late back- 
off mode cycles. BOFF# can be (and usually is) as- 
serted longer than one clock period, as Figure 5.11 
shows; the remaining figures show an active time of 
only one clock. 


5.2.2.4 One-Clock Late Back-Off Mode 


In one-clock late back-off mode the data is delayed 
internally by one clock before it is used. 


In this mode, data and BRDY # are seen by internal 
logic one clock period later than they appear on the 
bus, which is equivalent to adding an extra wait state 
to reads on the external bus (Figure 5.13). All re- 
sponses to BRDY # (assertion of the ADS# for the 
next cycle, assertion of HLDA in response to a 
HOLD request, and deassertion of HITM#) are de- 
layed by one clock period compared to the normal 
mode of operation. Not delayed, however, are write 
data on D63-D0 and sampling of KEN# and WB/ 
WT#. KEN# and WB/WT# must be valid with the 
first BRDY # assertion. Also, the response to NA# 
(assertion of ADS#) is not delayed if fewer than 
three pipelined cycles are outstanding. 
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NOTES: 

A Noncacheable, 64-bit cycle (one transfer 

B Next cycle (any type) | 

1. BOFF# cancels cycle and data transfer 

2. Cycle A restarts one clock after BOFF # is deasserted 
3. Earliest ADS# assertion for next cycle 


Figure 5.11. Normal Back-Off 


_ CACHE# 
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NOTES: 5d | 
A None acheabls: 64- bit ies ee densien 
B Next cycle (any type) 
_1. BOFF # cancels cycle and data transfer 
2. Cycle A restarts one clock after BOFF # is deasserted 
3. Earliest ADS # assertion for next cycle. 


Figure 5.12. One-Clock Normal Back- Off 
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1. Idle clock due to internal delay of BRDY # 
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Figure 5.13. Fastest Nonpipelined Cycles in One-Clock Late Back-Off Mode 


If BOFF # is asserted as late as the second BRDY # 
(Figure 5.14), it cancels the entire cycle, ignores 
data latched with the first BRDY#, and ignores the 
data being driven with the second BRDY #. This is 
true of a two-transfer burst (Shown) as well as a four- 
transfer burst (not shown). 


In a two-transfer burst, if BOFF# is asserted in the 
_ clock after the second BRDY# (Figure 5.15), it still 
| cancels the cycle. 


In a four-transfer burst, if BOFF# is asserted within 
one clock after the last BRDY # (Figure 5.16), it still 
forces a retry of the cycle, but previously transferred 
read data is used by the processor if it satisfies the 
read request. 


5.2.2.5 Two-Clock Late Back-Off Mode 


Two-clock late back-off mode gives external logic 
even more time to decide to use BOFF#. In this 


mode, data delivery is delayed by either one or two 
clock periods, depending on external activity. For 
any BRDY #, the data is delayed by one clock peri- 
od. If in the next clock period BRDY # is again as- 
serted, the previous data is used. However, if in that . 
next clock period BRDY # remains inactive, the data - 
is delayed for one extra clock period before it is 
used. The responses to BRDY# (assertion of the 


' ADS# for the next cycle, assertion of HLDA, and 


deassertion of HITM#) are delayed by one or two 
clock periods, depending on the value of BRDY # in 
the next clock. The response to NA# (assertion of 
ADS #) is not delayed if fewer than three pipelined 


_ cycles are outstanding. 


_The st.c dirbase instruction that sets the LB bit 
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must be aligned on a 32-byte boundary and must be 
followed by seven-nop instructions. Software must 
not enable late back-off mode when the processor is 
used with the 82495XP external cache controller. 
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CACHE# 


240874-41 
NOTES: 
A Noncacheable, 128-bit cycle (two transfers) 
B Next cycle (any type) 
1. BOFF # cancels both transfers (A1 in buffer, A2 on D63—D0) 


2. Cycle A restarts one clock after BOFF # is deasserted 
3. Earliest ADS# assertion for next cycle 


Figure 5.14. One-Clock Late Back-Off Mode (Case 1) 
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NOTES: » 

A Noncacheable, 128-bit cycle (two transfers) 

B Next cycle (noncacheable) 

1, BOFF # cancels both transfers (A2 in buffer i is needed to satisfy request 
2. Cycle A restarts one clock after BOFF# is deasserted 

3. Earliest ADS# assertion for next cycle 


Figure 5.15. One-Clock Late Back-Off Mode (Case 2) 
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NOTES: 
A Cacheable 64-bit (or less) cycle (four transfers) 

B Next cycle (any type) 

1. BOFF # cancels A2 and A3 transfers, but A1 transfer has already satisfied nequest 
2. Cycle A restarts one clock after BOFF # is deasserted 

3. Earliest ADS# assertion for next cycle 


Figure 5.16. One-Clock Late Back-Off Mode (Case 3) ; 


CACHE# 


240874-44 


Figure 5.17. Two-Clock Late Back-Off Mode 


5.3. Cache Inauiry Cycles (Snoopin tional in order to allow the address of inquiry to be 
a a a ( P | 9) _ driven by the system. An inquiry cycle can begin dur- 
Another processor initiates an inquiry cycle to check — ing any hold state: 


whether an address is cached in the internal dataor | 4. While HOLD and HLDA are asserted. 
instruction cache of the i860 XP microprocessor. An 2. While BOFF# is asserted. 

inquiry cycle differs from any other cycle in that it is © 
initiated externally to the i860 XP microprocessor, 3. While AHOLD (address a is asserted. — 
and the signal for beginning the cycle is EADS # (Ex- | 

ternal Address Status) instead of ADS#. The ad- 

dress bus of the i860 XP microprocessor is bidirec- 
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If neither a HOLD nor a BOFF # is in effect, the sys- 
tem can assert AHOLD to interrupt the current bus 
activity. 


EADS # is first sampled two clocks after BOFF# or 
AHOLD assertion, or one clock after HLDA. This al- 
lows time for the processor to float A31-—A5 and for 
the system to stabilize the inquiry address there. 


In the clock in which EADS# is asserted, the 
i860 XP microprocessor samples these inputs, 
which qualify the type of inquiry: 


INV Specifies whether the line (if found) must 
. be invalidated (that is, changed to I-state). 
FLINE# Specifies whether the line (if found in M- 


state) must be written back immediately or 
after outstanding bus cycles are complet- 
ed. | 


The i860 XP microprocessor compares the address 
of the inquiry request with addresses of lines in 
cache and of any line in the write-back buffer waiting 


ADDRESS | 


_ NOTES: 


I 
aja TO CPU } me 
= | 
t | 
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to be transferred on the bus. It does not, however, 
compare with the address of write-miss data in the 
write buffers. Two clock periods after sampling 
EADS #, the i860 XP drives the results of the inquiry 
look-up on these output pins: 


HIT# Specifies whether the address was found 
(active) or not found (inactive). 


HITM# lf active, the line found was in the M-state; 
if inactive, the line was in E- or S-state, or 
was not found. 


Figure 5.18 shows an inquiry with AHOLD that miss- 
es the cache. When the system asserts AHOLD, the 
i860 XP microprocessor floats A31-—A3 in the next 
clock period. It does not, however, assert HLDA; no . 
acknowledge is required. Once the address pins are 
floating, external logic drives the address for the in- 
quiry on A31-—A5 and starts the inquiry cycle by acti- 
vating EADS#. The i860 XP microprocessor does 
not begin sampling EADS# until the second clock 
after AHOLD is activated. EADS# activation pmey. be 
nee’ any number of EeOeKs. 


a | FROM CPU 
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A Outstanding cycle (for example, a single-transfer read) finishes during the inquiry 
1. Earliest assertion of EADS# is two clocks after assertion of AHOLD 
2. Earliest deassertion of AHOLD is.one clock after assertion of EADS # 


3. HIT # is valid two clocks after assertion of EADS # 


4. Earliest assertion of ADS# for next cycle is one clock after deassertion of AHOLD 


Figure 5.18. Inquiry Miss Cycle 
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_ The earliest that AHOLD can be deasserted is the 5.3.1 INQUIRY WRITE-BACK CYCLES 

clock after EADS # assertion. However, by maintain- ) 
ing AHOLD active, multiple inquiry cycles can be ex- lf an inquiry cycle hits a dirty (M-state) line in the 
ecuted in one AHOLD session (Figure 5.19). The i860 XP microprocessor cache, the i860 XP micro- 
i860 XP microprocessor can accept inquiry cycles at processor asserts the HITM# signal to indicate that 
a rate of one every other clock period, unless a . __ the line will be written on the bus. The HITM# output 


write-back is required. The earliest that ADS# can becomes valid in the same clock period as HIT #. In 
be asserted for the next cycle is the clock after this case the modified line is written out, and the 
AHOLD deassertion. cache entry is changed to either | or S state accord- 


: ing to INV. The HITM# signal.stays active through 
The second inquiry in Figure 5.19 hits an unmodified the last BRDY# for the comesponcing write- back 


line in the cache. When a cache line with matching cycle. 

address is found and the INV input signal is asserted _— oe 

(as in this case), that line is invalidated (changed to An inquiry write-back cycle is similar to ordinary 
l-state). If the INV signal is inactive, the line enters write-back cycles. It is initiated by assertion of 
S-state. | ADS #. ADS# is asserted even when the AHOLD 


ADDRESS 


— ae ee ee ee ee ee ee ee ee se eee eee se | 
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NOTES: 

A Outstanding cycle (for example, a single-transfer read) finishes during the inquiry 

B Earliest inquiry, no invalidation 

C Earliest successive inquiry, with invalidation 

1. EADS # is not sampled in the clock after its assertion 

2. Inquiry B misses cache 

3. Earliest deassertion of AHOLD is one clock after last assertion of EADS # 

4. Inquiry C hits cache, invalidates line 

5. Earliest-assertion of ADS# for next cycle is one clock after deassertion of AHOLD 


Figure 5.19. Fastest Inquiry Cycles (Miss and Hit) 
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signal is active. The cycle definition signals are driv- 
en properly by the processor, however, the address 
pins are not driven, because activation of AHOLD 
forces the i860 XP microprocessor off the address 
bus. If, however, AHOLD is deasserted before or 
during the write-back cycle, the i860 XP microproc- 
essor drives the correct address for the write- back. 


For all types of inquiry, the write- backs are not pipe- 
lined into an outstanding cycle,.except when the 


FLINE# pin is used (refer to section 5.3.5). ADS# | 


for the inquiry write-back is asserted from one to four 


ADDRESS 


CACHE# — 


NOTES: 

A Outstanding cycle (for example, a single-transfer read)» 
| W Write-back cycle 

1. EADS # is not sampled while HITM# is active 

2. Earliest ADS# assertion if not delayed by outstanding cycle 

3. ADS # for write-back delayed by outstanding cycle 

4. HITM# deactivates after last BRDY # of write- back 
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clock periods after the HITM # pin is driven active or 
after the last BRDY # is returned for any outstanding 
cycle, whichever occurs later. 


Bursts for a HITM#. write-back, as for any write- 
back, are in the order 0, 8, 0x10, 0x18, because the 
i860 XP microprocessor ignores A4—A3 of the in- 


-quiry address. 


Figure 5.20 shows an mga ae that hits an M- 
stale line. 


Le ee | 
1 | 
‘ 1 
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oe 5.20. Inquiry Hit Cycle with Write-Back 
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The fact that a write-back cycle is initiated while ad- 
dress lines are floating supports multiple inquiries 
(with write-backs) during a single AHOLD session. 
This is especially useful during secondary cache re- 
placement processing, when the secondary-cache 
line is larger than that of the i860 XP microproces- 
sor. 


Note that EADS # is ignored as long as HITM# is 
active. If the system is executing a series of inquir- 
ies, it might happen that the HITM# assertion for 
one inquiry masks the EADS# for a subsequent in- 
quiry. In that case the system must reassert EADS # 
to restart the masked inquiry. 


Inquiries can occur during a hold due to HOLD/ 
HLDA or BOFF #. However, in these cases, the cy- 
cle definition pins and ADS#¥ are floating. If an in- 
quiry requires a write-back, the HOLD or BOFF# 
must be deasserted so that the cycle definition pins 
and ADS# can be driven to start the write-back cy- 
cle. If HITM# is active at the time of ADS#, the first 
ADS# issued after HOLD is deasserted corre- 
sponds to the write-back of the modified line which 
‘was snooped. 


5.3.2 SNOOPING RESPONSIBILITY LIMITS 


The i860 XP microprocessor takes responsibility for 
responding to inquiry cycles for a cache line only 
- during the time that the line is actually in the cache 
or in a write-back buffer. There are times during the 


cache line fill cycle and during the cache replace- . 
ment cycle when the line is “in transit”, and inquiry 


(snooping) responsibility must be taken by other sys- 
tem components. 


Systems designers should consider the possibility 
that an inquiry cycle may arrive at the same time as 
a cache line fill or replacement for the same ad- 
dress. This situation can occur: 


© |In multiprocessor systems that have external 
(secondary) caches with separate CPU and 
memory busses, thereby allowing concurrent ac- 
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tivity on the two busses. In such systems, it is 
desirable to run invalidation cycles concurrently 
with other i860 XP microprocessor bus activity. It _ 
can happen that writes on the memory bus cause 
invalidation requests to the i860 XP microproces- 
sor at the same time that the i860 XP microproc- 
essor fetches data from the secondary cache. 
Such events can occur at any time relative to 
each other. 


° In multiprocessor systems with no secondary 
cache, if memory is dual-ported. In such systems, 
two processors can simultaneously read the 
same line, each sending an inquiry to the other. 


The simultaneous activities considered here may be 
for different data items in the same cache line. Un- 
less the inquiry request is timed carefully with re- 


spect to the cache fill cycle, the cache-consistency KS) : 
mechanism may be subverted, and data inconsist- hi 


encies may result (for example, both CPUs may get 
the line in E-state on a read). If the 82495XP and 
82490XP cache is being used, the timing with re- 
spect to the i860 XP microprocessor is handled cor-: 
rectly by the cache controller; however, the same ~ 
problem may arise between the memory system and 
the secondary cache. 


There are two cases to consider: 
1. Inquiry for a line that is being cached. 
2. Inquiry for a line that is being replaced. 


5.3.2.1 Inquiry for a Line Being Cached 


‘ The i860 XP microprocessor accepts an inquiry cy- 


cle at any time, even if it hits the line being cached at 
that time. Regardless of the timing of the cycle, the 
i860 XP microprocessor delivers the read data to the 
load instruction that initiated the read request. How- 
ever, the timing of the invalidation cycle determines 
whether the line is placed in the cache and what 
value the i860 XP microprocessor drives on HIT #. 
Table 5.4 summarizes the different cases. 


Table 5.4. Inquiry for a Line being Cached 


EADS # before 
or with NA# 
or ist BRDY # 


YES 


Data/Instruction 
used by CPU? 


EADS# after 
NA# or , 
ist BRDY # ee 


ifital. 


If EADS # is asserted before or with the sampling of 
KEN#, the processor cannot match the address of 
the line being cached with an invalidation request. 
Thus, the processor does not assert HIT#. The ex- 
ternal system must satisfy the inquiry with the cor- 
rect data and WB/WT # status. If invalidation of that 
line is required, the system must do one of the fol- 
lowing: 


e Delay assertion of EADS # until one clock after 
assertion of KEN#. 


© Reassert EADS # after KEN#. © 


ADDRESS 


NOTES: 

A Cache line fill sucie’ 

S Snoop (inquiry) cycle 

R Addresses of cache line fill and snoop are the same 
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e Make KEN # inactive at the first BRDY # or NA#, 


thereby prevenlng the: line from being cached. 


Figures 5.21 and 5.22 show when the i860 XP micro- 
processor picks up responsibility for inquiries for a 
line that it is caching. Figure 5.21 shows the earliest 
EADS# assertion that invalidates the line being 
cached relative to the first BRDY # for nonpipelined 
cycles. Figure 5.22 shows the earliest EADS# as- 
sertion that invalidates the line being cached relative 
to the first NA# for pipelined cycles. These timings 
hold for normal and late back-off modes. 
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1. Earliest EADS# assertion that can invalidate line being filled 


_ Figure 5.21. Snoop Responsibility Pickup (Nonpipelined Cycle) 
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ADDRESS 


NOTES: 
A Cache line fill cycle 

B Next cycle (any type) 

S Snoop (inquiry) cycle 

R Addresses of cycles A and S are the same 

1. Earliest EADS# assertion that can invalidate line being filled 


Figure 5.22. Snoop Responsibility Pickup (Pipelined Cycle) 


5.3.2.2 Inquiry for a Line Being Replaced Figures 5.23 through 5.25 show when the i860 XP 

3 microprocessor drops responsibility for recognizing 
When the i860 XP microprocessor is replacing a line, inquiries for a line that it is writing back. They show 
there are two cases: the latest EADS# assertion that can cause HITM# 


assertion. In late back-off mode, EADS # can be as- 
serted later, because BRDY # is internally delayed 
(Figures 5.24 and 5.25). 


1. If the replacement does not require write-back, 
the address being replaced can be matched by 
an inquiry until assertion of NA# or first BRDY # 


of the line-fill cycle. From that point on, the in- : 
quiry has no effect. In all these cases, HITM# remains active for only 
e one clock period. HITM#, as always, remains active 
2. If the replacement requires a write-back, the ad- through the last BRDY # of the corresponding write- 


dress being replaced can be matched by an in- —_— hack: in these cases the write-back has already 
quiry until assertion of the last BRDY# for the completed. 


write-back. An EADS # as late as two clocks be- 


fore the last BRDY# can cause HITM# to be If an inquiry cycle hits the write-back address after 
asserted. | | its ADS# has been issued, the i860 XP microproc- 
| essor asserts HITM#; however, HIT # is deassert- 
ed. This unique combination of values on HIT# and © 
HITM# indicates that the write-back cycle corre- 
sponding to the HITM# has already been issued. 
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AHOLD 
oe 
HIT# 
“HITM# 
apse 
BRDY# 


ADDRESS - 
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NOTES: 

A Write-back cycle 

S Snoop (inquiry) cycle 

R Addresses of cycles A and S are the same 


Figure 5.23. Latest Snooping of Write-Back (Not Late Back-Off Mode) 
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ADDRESS» 
| 240874-51 
NOTES: 

A Write-back cycle 


S Snoop (inquiry) cycle 
R Addresses of cycles A and S are the same 


Figure 5.24. Latest Snooping of Write-Back (One-Clock Late Back-Off Wiode) 


2-78 


intel. | i860™ XP MICROPROCESSOR PRELIMINARY 


EADS# 


ADDRESS 
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NOTES: 
A Write-back cycle 

S Snoop (inquiry) cycle 

R Addresses of cycles A and S are the same 


Figure 5.25. Latest Snooping of Write-Back (Two-Clock Late Back-Off Mode) 


5.3.3 WRITE CYCLE REORDERING DUE TO 3. Processor 1 executes a store to address B, which 
BUFFERING hits the cache. 


4. Processor 2 executes an inquiry for address B. 
Processor 1 looks in its cache, finds the modified 
line, asserts HIT# and HITM#, and executes a 
write-back cycle to address B, while the data for 
address A is still in the write buffer. 


The MESI cache protocol and the ability to perform 
and respond to inquiry cycles guarantee that writes 
to the cache are logically equivalent to writes that go 
to memory. In particular, the order of read and write 
operations on cached data is the same as if the op- 


erations were on data in memory. Even uncached 5 Processor 1 issues the write to address A on the 
memory read and write requests usually occur on bus. 

the external bus in the same order that they are is- oy | 

sued in the program. For example, when a write miss In this example, the original order of the writes has 
-is followed by a read miss, the write data goes onto been changed. In most cases it is not necessary that 
the bus before the read request is put on the bus. the ordering of writes be strictly maintained. But 
However, the posting of writes in write buffers cou- there are cases (for example, semaphore updates in 
pled with inquiry cycles may cause the order of a multiprocessor system) that require stores to be | 
writes seen on the external bus to differ from the observed externally in the same order as pro- 
order they appear in the program. Consider the fol- grammed. There are several ways to ensure seriali- 
lowing example, which is illustrated in Figure 5.26: zation of stores: 

1. Three bus cycles are outstanding. 1. Bracket one of the stores with the lock and 


unlock instructions. That forces serialization of 
the stores (refer to section 5.4). In. the above ex- 
ample of a store-miss followed by store-hit, lock- 
ing either store would ensure that the. internal 


2. Processor 1 executes a store to address A, which 
misses the cache. This store is posted; that is, 
the data is latched in the write buffer while the 
processor continues execution without waiting for store-hit does not update the cache until the miss 
the store to be completed on the bus. In this case gets to the external bus. 
the store is not even put on the bus because ; “i 
there are already three outstanding cycles. 2. Apply the write-through policy to the critical data, 

7 by setting WT =1 in the page table entries or by 
driving the WB/WT # pin low. . 
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NOTES: 

A Data written by st.x A instruction 
B Data written by st.x B instruction 
1. Snoop for address of B | 
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2. Snoop look-up in tag array occurs here; finds B modified 
3. Write-back of line containing B occurs before write of A 


PRELIMINARY 


240874-53 


Figure 5.26. Write Reordering 406 to Buffering 


3. Configure the processor for Strong Ordering 
Mode by asserting EWBE# during RESET. | 


Option 1 is implementable by user-level programs, 
while option 2 is an operating-system level solution, 
not directly implementable by user-level code. Op- 
tion 3, the hardware solution, :is discussed in greater 


- detail in section 5.3.4. | 


5.3.4 STRONG ORDERING MODE 


In strong ordering mode, the processor delays up- 
dates to its internal data cache in either of these 
conditions: 


1. The internal write’ suriers is sot Snniys 


2. An external write buffer is not empty (the external 
system signals this condition by Beacivatng the 
EWBE # signal). 


By delaying the cache update until-all write buffers 
are empty, the i860: XP microprocessor avoids the 
out-of-order sequence shown in section 5.3.3. 
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In strong ordering mode, EWBE# can be deassert- 
ed only between the ADS# and the last BRDY# of 
a store. The earliest deassertion is the clock after 
ADS #; the latest deassertion is together with the 
last BRDY#. EWBE# can be reasserted at any 
time, except when the processor is performing an 
inquiry write-back. In other words, EWBE# must not 
activate while HITM# is active. When EWBE# goes 
active, the processor completes any cache update 
that may have been delayed by its deassertion. 


Figure 5.27 shows how an external cache can use 
EWBE# when a store miss in the i860 XP micro- 
processor. is also a miss in the external cache. | 


An external sachs controller should also refrain oni 
updating the external cache while EWBE # is active. 


a 


NOTE: 
1. Assumes the external cache needs five cycles to write the data to memory. 
2. Pending internal data cache updates are delayed until the clock in which EWBE# is sampled LOW. 
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Figure 5.27. Timing of EWBE # 


5.3.5 SCHEDULING INQUIRY WRITE-BACK 
CYCLES | 


} 


In order to preserve system-wide ordering of memo- 
ry transactions in multiprocessor systems that have 
a pipelined or split-transaction memory bus, it may 
be necessary to get the data corresponding to an 
inquiry hit before outstanding bus cycles are com- 
pleted. Another bus master can always request an 
inquiry while the i860 XP microprocessor has cycles 
outstanding on the bus. However, when AHOLD is 
asserted, the i860 XP microprocessor normally com- 
pletes outstanding cycles before it performs any 
write-back that may be required. The i860 XP micro- 


processor provides two methods for causing the in- — 


quiry write-back before outstanding cycles are com- 
pleted:. . 


FLINE# When FLINE# is asserted during the 
EADS# of an inquiry that hits an M-state 
line, the i860 XP microprocessor issues a 
write-back cycle and writes the dirty line to 
memory before the outstanding bus cycles 
are completed. 


~BOFF# If there are outstanding cycles on the bus, 


asserting BOFF # clears the bus pipeline. 


lf an inquiry causes HITM # to be asserted, 
then the first cycle issued by the i860 XP 
microprocessor after deassertion of 
BOFF # is the inquiry write-back cycle. Af- 
ter the inquiry write-back, it reissues the 
aborted cycles. | 


5.3.5.1 Choosing between FLINE# and BOFF # 


FLINE #, although the more efficient choice, cannot 
-handle all situations. Under certain circumstances, it 


can happen that outstanding stores on the bus cor- | 
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respond to data that is obsolete relative to the data 
in the cache, because a subsequent store has up- 
dated the cache after the ADS# for the outstanding 
store has occurred. For example: 


© An aliasing store hit, in which a cache virtual-tag 
miss occurs and the ADS # is issued at the same 
time as a physical-tag hit. Then the cached data 
would be updated before external memory, anda 
subsequent store to the new virtual: address 
could also update cache before the outstanding 
bus store completed. 


Back-to-back writes to the same line can also up- 
date the cache more recently than the bus when 
the write-once update policy is employed. The 
first write updates the cache and generates a bus 
write request, but the second write only updates 
the cache. 


In both of these examples the outstanding stores on 
the bus are obsolete relative to the data in the cache 
line. If an inquiry cycle hits a line and this line is 
written back out of order (that is, before outstanding 
stores are completed), special care should be ae 
to discard the outstanding stores. — 


The easiest way to avoid this situation is not to as- 
sert FLINE# when stores are outstanding, but use 
BOFF # instead. If out-of-order write-back is imple- 
mented with BOFF#, the i860 XP microprocessor — 
does not restart the outstanding store to that line if 
such a store has been obsoleted by a later cache hit 
store. That is, the i860 XP microprocessor detects 
this condition and kills the obsolete data. However, 
lock-bracketed stores (including the last store in the 
lock sequence) are restarted by the i860 XP micro- 
processor, because lock-bracketed stores update 
the cache only after BRDY # is returned. 


intel. | 


If, on the other hand, out-of-order write-back is im- 
_ plemented by using only the FLINE# pin, the exter- 
nal system must return BRDY#s for outstanding 
stores, but the data must be ignored if it has nateay 
‘been written out by an inquiry write- back. . 


Note that if a replacement write-back is in progress 
(ADS# has been issued, but last BRDY# has not 
occurred) and an inquiry hits the same line that is 
being written back, the FLINE# pin is ignored. The 
" system can recognize this special case by the fact 


that HITM# is asserted while HIT # is deasserted. If * 


other cycles are outstanding and it is necessary to 
write the line back before. the other cycles, BOFF # 
can be used. 

5.3.5.2 Reordering Write-Backs with FLINE# — 


FLINE # must be active during the EADS # that initi- 


ates an inquiry. BRDY # must not be asserted forthe © 


previously issued cycles while HITM# is active. If 
HITM# is asserted while the data transfer of the 
outstanding cycle is. in progress (i.e. first BRDY # 
has been asserted, but the entire transfer has not 


FLINE# 


 ADS# 


\ : i es 
: Ly ~ ( t 
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yet been completed), the i860 XP microprocessor 


' waits for the current cycle to complete, and only 


then issues the write-back. After the last BRDY # for 
the ongoing burst (if any), BRDY# is ignored until 
the clock period after ADS# is asserted for the 
write-back. 


From the viewpoint of the i860 XP microprocessor, 
an inquiry write-back cycle is just another bus cycle; 
so, if there is an outstanding cycle at the time of 
FLINE# and HITM# activation, the system must as- 


_-sert NA# to initiate the write-back. 


Figure 5.28 illustrates simple cycle reordering, when 
FLINE# is not asserted during the data transfer of 
another cycle. The outstanding request could be ei- 
ther a read or write. 


Figure 5.29 shows the case in which FLINE # is as- 
serted after data transfer for the outstanding cycle 
has already started. In this case, the i860 XP micro- 
processor does not issue a write-back until the out- 
standing transfer is completed. NA# is needed in 
this example only if other outstanding cycles remain. 


CEE EX - 
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_ Figure 5.28. Cycle Reordering via FLINE # (No Ongoing Burst) 
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NOTES: 
1. BRDY# is ignored by CPU from end of ongoing burst vee) ADS# of write-back, even if other cycles remain 
outstanding ae 

2. NA# required only if another cycle is outstanding 

3. If the first BRDY# is asserted here or sooner (relative to HITM#), the outstanding cycle completes before the 
FLINE# write-back. 


Figure 5.29. Cycle Reordering via FLINE # (Ongoing Burst) 
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5.3.5.3 Reordering Write-Backs with BOFF# 


| Back-off cycles are discussed in general in Section 
5.2.2. Figure 5.30 shows how BOFF# can be used | 


to cancel outstanding cycles so that an inquiry write- 


back can take place immediately. 


NOTES: 
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5.4 The LOCK# Cycle Attribute 


The processor asserts the LOCK # signal when sev- 
eral accesses to a single memory location must be 
effectively uninterruptible. By causing LOCK # to be 
asserted, a programmer can, for example, increment 


_ the contents of a memory variable and be assured 


that the variable will not be accessed between the 


read and the update of that variable. . 
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A Outstanding cycle (for example, noncacheable 128- bit aati W Write-back cycle 


. AHOLD begins an inquiry while one cycle is outstanding. 


. Earliest assertion of EADS# is two clocks after assertion of AHOLD 


. Inquiry hits modified line. 

. Assertion of BOFF# aborts the outstanding aes 

. BRDY # asserted during BOFF # is ignored by CPU. 
..Write-back begins after deassertion of BOFF #. 


. Earliest assertion of ADS # for restart of cycle A (assuming no pipelining). 


Figure 5.30. Cycle Reordering via BOFF # (Ongoing Burst) 
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The memory location to be locked is the one whose 
address is driven during the cycle in which LOCK # 
is first activated. In multiprocessor systems, external 
~ hardware should guarantee that no other processor 
is granted a locked read, locked write, or unlocked 
write to the same location until LOCK # is deassert- 
ed. The i860 XP microprocessor has no hardware 
provision to prevent another master from also lock- 
ing the variable; this responsibility falls on the bus 
arbiter. In the simplest implementation, the arbiter 
can globally prevent other masters from accessing 
the bus. 


Not all cycles affect the value of LOCK#. Code 
fetches, write-backs due to replacement or inquiry, 
and cycles restarted due to BOFF# do not affect 
LOCK#. Any other type of cycle can be used to initi- 
ate or terminate LOCK#, including cache line fills, 
interrupt acknowledge, I/O, and special cycles. 


Data accesses with LOCK# asserted are not pipe- 
lined, and other data cycles are not pipelined while a 
LOCK# cycle remains outstanding. Instruction 
fetches, however, may be pipelined during lock. 
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The i860 XP microprocessor can run very long lock 


sequences; therefore, to guarantee reasonable bus 
turnover latency in multimaster systems, the i860 XP 


CACHE# 
w/Re 
ADS# 

ADDRESS 

_ BRDY# 


DATA 


NOTES: 

|. Locking access 

U. Unlocking access 

1. This address is to be locked 

2. LOCK# is asserted with ADS # 

3. LOCK# is deasserted one clock after ADS # 
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microprocessor recognizes bus hold (HOLD), ad- 
dress hold (AHOLD), and back-off (BOFF #) while 
the LOCK # signal is active. In spite of such inter- 
vening conditions, the arbiter should prevent any 
other bus master from also locking or updating the 
variable the i860 XP microprocessor locked. In sim- 
ple systems the HOLD input can be masked by the 
LOCK # output (that is, the external logic that gener- 
ates HOLD can AND the LOCK# signal with other 
hold. conditions). More sophisticated systems, how- 
ever, may allow the bus to be turned over while 
LOCK # is asserted. | | 


Whatever the lock implementation, arbiter design 
must, in one case, allow another processor to write 
the locked variable. That case is when another 


i860 XP microprocessor or master asserts HITM# in papa 


response to the inquiry generated by the locking 
processor’s initial read. That other master must write 


back the locked variable before the i860 XP micro- Siam o 


processor can read it. This HITM# write-back must 
always be allowed. 


The timing of LOCK # is shown in Figure 5.31. Note 
that LOCK # is asserted in the same clock period as 
ADS# for the locked address, but is deasserted in 
the clock period after ADS# for the unlocking load 
or store. 
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Figure 5.31. LOCK # Timing © 
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5.5 RESET Initialization 


Initialization of the i860 XP microprocessor is caused 
when the system asserts the RESET signal for at 
least ten clocks. Table 5.5 shows the status of out- 
put pins during the time that RESET is asserted. 
Note that the bidirectional data pins (D63-—D0O and 
DP7-DP0) are floated during RESET, though the bi- 
directional AS1—A3 pins are not. If the i860 XP mi- 
croprocessor is used with 82495XP and 82496XP 
cache, however, the latter do float the bidirectional 
pins they share with i860 XP microprocessor during 
RESET. Note that'HOLD requests are honored dur- 
ing RESET and that the HLDA output. signal may 
also become active. The status of output pins de- 
pends on whether a HOLD request is being acknowl- 
edged. Note also that the test logic may be active 
during RESET and that the EXTEST instruction may 
drive other values on the output pins. 


After the RESET signal goes inactive the processor 
remains in the RESET state for three more clocks. 
Applications that use the HOLD signal to float the 
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bus during RESET. should keep HOLD active: for 
three more clocks after the RESET signal is deactt- 
vated. | ae oe, 


Some aspects of processor configuration are deter- 
mined by asserting input signals during RESET. To 
select a given option, the corresponding input must 
be asserted for at least the last three clocks before 
the falling edge of RESET; to deselect, the corre- 
sponding input must be deasserted for at least the 
last three clocks before the falling edge of RESET: 


EWBE# Enter strong ordering mode. 
FLINE# Enter one clock late back-off mode. 
INT/CS8 Enter eight-bit code-size mode. 


PEN # Enter normal. (small: output pulls) cur- 
| rent mode. 


Figure 5.32 shows how configuration pins are sam- ; 


pled during the three clock periods just before the } 


_ falling edge of RESET. No inputs besides EWBE#, 


HOLD, FLINE#, INT/CS8, and PEN# are poe 
during RESET. | 


Table 5.5. Output Pin Status during Reset 


Pin Name 


BREQ 
_| HLDA 
W/R#, PWT, PCD 
ADS # 
D63-D0, DP7—DPO 


A31-A3, BE7 #0-BEO#, NENE# CACHE#, CTYP, D/C#, 


KBO, KB1, LEN, M/IO#, PCYC 
PCHK#, HIT# : 
HITM#, LOCK# 


& 
n 


NOTE: 


~ HOLD ~’ HOLD > 
Not Acknowledged | Acknowledged 
LOW ~ LOW 
LOW HIGH 
LOW Tristate OFF 
HIGH Tristate OFF 


Tristate OFF Tristate OFF 
Undefined Tristate OFF 


Undefined 
HIGH 


Undefined 
HIGH 


This table does not apply if the test logic is eunning the Eniee! instruction. 


RESET 


EWBE#, 
FLINE#, 
INT/CSB, 
PEN# 
OTHER 
INPUTS 


HOLD 


NOTE: 
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1. The CPU samples these inputs in the clocks preceding the falling edge of reset. 


Figure 5.32. Reset Activities 


intel. 


While in eight-bit code-size mode, instruction cache 
misses are one-byte reads (transferred on D7-DO0O of 
the data bus) instead of eight-byte reads. This allows 
the i860 XP microprocessor to be bootstrapped from 
an eight-bit ROM. For these code reads, byte en- 
ables BE2#-BEO# are redefined to be the low or- 
der three bits of the address, so that a complete 
byte address is available. The entire eight-byte data 
bus continues to be parity-checked by the i860 XP 
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microprocessor during CS8-mode instruction fetch- | 


es; therefore, external hardware must either gener- 
ate good parity on all eight bytes or disable parity 
traps by deasserting PEN # during CS8 mode. 


While in this mode, instructions must reside in an 
eight-bit wide memory, while data must reside in a 
separate 64-bit wide memory. After the code has 
been loaded into 64-bit memory, initialization code 
can initiate 64-bit code fetches by clearing the CS8 
bit of the dirbase register (refer to section 2). Once 
eight-bit code-size mode is disabled by software, it 
cannot be reenabled except by resetting the i860 XP 
microprocessor. 


‘Instruction fetches in CS8 mode update. the instruc- 
tion cache if KEN# is asserted during NA# or all of 
the first eight BRDY#s (refer to section 4.2.26). 
They are pipelined if NA# is asserted. When used 
with the 82495XP and 82496XP cache, CS8 mode 
‘works only if the ROM locations are made non- 
cacheable. 


6.0 TESTABILITY 


The i860 XP microprocessor provides testability fea- 
tures compatible with the proposed Standard Test 
Access Port and Boundary-Scan Architecture (IEEE 
‘Std. P1149.1/D6). The subset of the standard test 
logic implemented in the i860 XP microprocessor 
provides for testing the interconnections between 
the i860 XP microprocessor and other integrated Cir- 
cuits once they have been assembled onto a printed 
circuit board. 


The test logic consists of a boundary-scan register 
and other building blocks that are accessed through 
a test access port (TAP). The TAP provides a simple 
serial interface that makes it possible to test all sig- 
nal traces with only a few probes. 


The TAP can be controlled by a bus master. The bus 
master can be either automatic test equipment or a 
component that interfaces to a four-pin test bus. 
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6.1 Test Architecture | 


The test logic contains the following elements: 


° Test access port (TAP), which consists of input 
pins TMS, TCK, TDI, and TRST#; and output pin 
TDO. 


TAP controller, which receives the dedicated test 
clock (TCK) and interprets the signals on the test 
mode select (TMS) line. The TAP controller gen- 
erates clock and control signals for the instruc- 
tion and test data registers and for other parts of 
the test logic. 


Instruction register (IR), which allows instruction 

codes to be shifted into the test logic. The in- 
struction codes are used to select the test to be 
performed or the test data register to be ac- 
cessed. 


° Test data registers: Bypass Register (BPR), De- ie 


vice Identification Register (DID), and Boundary- 
Scan Register (BSR). 


The instruction and test data registers are separate 
shift-register paths connected in parallel and having 
a common:serial data input and a common serial 
data output connected to the TAP TDI and TDO sig- 
nals respectively. 


6.2 Test Data Registers 


The test logic contains the following data registers: 


° Bypass Register (BPR): BPR is a one-bit shift 


2-8/7 


register that provides a minimum-length path be- 
tween TDI and TDO when no test operation of 
the component is required. This allows more rap- 
id movement of test data to and from other board 
components that are required to perform test op- 
erations. While running through BPR, the data is 
transferred without inversion from TDI to TDO. 


Device Identification Register (DID): This reg- 
ister contains the manufacturer’s identification 
code, part number code, and version code in the 
format shown by Figure 6.1. The values are: man- 
ufacturer’s identification code (9), part number 
code (61A0), version code (8), entire 32-bit value 
(0x861A0013). 


Boundary Scan Register (BSR): The BSR is a 
single shift-register path containing 150 cells that 
are connected to all input and output pins of the 
i860 XP microprocessor. Figure 6.2 shows the 
logical structure of the BSR. Input cells only cap- 
ture data; they do not affect operation of the 
i860 XP microprocessor. Data is transferred with- 
out inversion from TDI to TDO through the BSR 
during scanning. The BSR can be operated by 
the EXTEST and SAMPLE instructions. 
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Figure 6.1. Format of DID Register 


BOUNDARY SCAN REGISTER 


pea Bidirectional 
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Be) 3-State 


al 


TCK 
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' Figure 6.2. Logical Structure of BSR Register 
6.3 Instruction Register EXTEST The BSR cells associated with output pins 
3 | drive the output pins of the i860 XP micro- 
The Instruction Register (IR) selects the test to be processor. Values scanned into the BSR 
performed and the test data register to be accessed. cells become the output values. The BSR 
It is four bits wide, with no parity bit. Table 6.1 shows cells associated with input pins sample 
the encoding of the instructions supported by the | the inputs of the i860 XP microprocessor. 
TAP controller of the i860 XP microprocessor. The Note that !/O pins can be input or output 
rightmost bit is the least significant and is the first for this test, depending on their control 
shifted out on TDO. setting. The values shifted to the input 
| | latches are not used by the internal logic 
Table 6.1. TAP Instruction Encoding | of the i860 XP microprocessor. After use 
7 of the EXTEST command, the i860 XP mi- 
| InstructionCode | _Instruction | croprocessor must be reset (with the RE- 
| 0000 -. | EXTEST Boundary Scan _. SET signal) before normal use. 
0001 SAMPLE Boundary Scan SAMPLE The BSR cells associated with output pins 
0010 © IDCODE , sample the value driven by the i860 XP 
0011:...1110 Intel reserved CAUTION* | microprocessor. BSR cells associated 
4141. BYPASS with input pins sample on the rising edge 


of TCK the values driven to the i860 XP. 


* CAUTION: Operation of these private instructions may 
- Cause damage to the component. 
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microprocessor. BSR cells associated 
with 1/O pins sample the value on the re- 
spective pin. The I/O pin can be driven by 
the i860 XP microprocessor or by external 
hardware. The values shifted to the input 
latches are not used by the internal logic 
of the i860 XP microprocessor. 


The identification code of the i860 XP mi- 
croprocessor from the DID register is 
passed to TDO. The DID register is not 
altered by data shifted in on TDI. 


Test data is passed from TDI to TDO via 
the single-bit BPR, effectively bypassing 
the test logic of the i860 XP microproces- 
sor. Because of its special encoding, this 


IDCODE 


BYPASS 
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instruction can be entered by holding TDI 


HIGH while completing an_instruction- 
scan cycle. This reduces the demands on 
the host test system in cases where ac- 
cess is required, for example, only to chip 
57 on a 100-chip board. 


Note that an open circuit fault in the 
board-level test data path causes the 
BPR register to be selected following an 
instruction-scan cycle, because the TDI 
input has a pull-up resistor. Therefore, no 
unwanted interference with the operation 
of the on-chip system logic can occur. 


Table 6.2 defines which registers are active during 
execution of each instruction. 


6.4 TAP Controller 


The TAP Controller is a synchronous, finite state 
machine. It controls the sequence of operations of 


the test logic. The TAP Controller changes state - 


only in response to the following events: 

1. Arising edge of TCK. 

2. A transition to logic zero at the TRST# input. 
3. Power-up. 
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The value of the TMS input signal at a rising edge of 
TCK controls the sequence of state changes. The 
state diagram for the TAP controller is shown in Fig- 
ure 6.3. Test designers must consider the operation 
of the state machine in order to design the correct 
sequence of values to drive on TMS. 


6.4.1 TEST-LOGIC-RESET STATE 


In this state, the test logic is disabled so that normal 
operation of the i860 XP microprocessor can contin- 
ue unhindered. This is achieved by initializing the in- 
struction register such that the IDCODE instruction 
is loaded. No matter what the original state of the 
controller, the controller enters Test-Logic-Reset 
when the TMS input is held HIGH for at least five 
rising edges of TCK. The controller remains in this 
state while TMS is HIGH. 


If the controller leaves the 7est-Logic-Reset state as 
a result of an erroneous LOW signal on the TMS line 
at the time of a rising edge of TCK (for example, a 
glitch due to external interference), it returns to the 
Test-Logic-Reset state following three rising edges 
of TCK while the TMS signal at the intended HIGH 
logic level. The operation of the test logic is such 


that no disturbance is caused to on-chip systom log- 


ic operation as the result of such an orror. On leav- 


‘ing the 7est-Logic-Reset state, the controller moves 
into the Run-Test/Idle state, where no action occurs 


because the current instruction has been set to se- 


_ lect operation of the DID register. The test logic is 


also inactive in the Se/ect-DR-Scan and Select-/R- 


-Scan states. — 


The TAP controller is also forced to the 7est-Logic- 
Reset state by applying a LOW logic level to the 


TRST# input and at power-up: 


Table 6.2. dak Hat Active by Instruction 


a 


EXTEST 


TDI — BSR — TDO- 
TDI — BSR — TDO> 
Inactive 
Inactive 


~ SAMPLE 
IDCODE 
BYPASS 
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Inactive 

Inactive 

Inactive 
TDI — BPR — TDO 


Inactive 
Inactive 
DID — TDO 
Inactive 


Test-Logic= Vy 
Reset . 


f Run-Test- ®& 
idle gs 
ee - na io “ 
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-  Pause-DR » 


Update-DR Jie | » 
ap fo J ma 


NOTE: 


Update-IR 
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0,1 The values present on TMS at the time of a rising edge on TCK. 


Figure 6.3. TAP Controller State Diagram 


6.4.2 RUN-TEST/IDLE STATE 


The controller enters this state between scan opera- 
tions. Once in this state, the controller remains in 


6.4. 3 SELECT-DR-SCAN STATE 


This is a temporary controller state. The test data 


_ register selected by the current instruction retains its 


this state as long as TMS is held LOW. No activity 


occurs in the test logic. The instruction register and 
all test data registers retain their previous state. 
When TMS is HIGH. and a rising edge is applied to 
TCK, the controller moves to the Select-DR-Scan 
state. 


previous state. If TMS is held LOW and a rising edge 
is applied to TCK when in this state, the controller 
moves into the Capture-DRA state, and a scan se- 
quence for the selected test data register is initiated. 
If TMS is held HIGH and a rising edge is applied to 


-TCK, the controller moves to the eee hi aca 
State. 
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The instruction does not change in this state. 


6.4.4 SELECT-IR-SCAN STATE 


This is a temporary controller state. The test data 
register selected by the current instruction retains its 
previous state. If TMS is held LOW and arising edge 
is applied to TCK when in this state, the controller 
moves into the Capture-/R state, and a scan se- 
quence for the instruction register is initiated. If TMS 
is held HIGH and a rising edge is applied to TCK, the 
controller moves to the 7est-Logic-Reset state. 


The instruction does not change in this state. 


6.4.5 CAPTURE-DR STATE 


In this state, the BSR captures input pin data if the 
Current instruction is EXTEST or SAMPLE. The other 
test data registers, which do not have parallel input, 
are not changed. 


The instruction does not change in this state. 


When the TAP controller is in this state and a rising 
,edge is applied to TCK, the controller enters the 
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Exit1-DR state if TMS is HIGH or the Shift-DR state. 


if TMS is LOW. 


.6.4.6 SHIFT-DR STATE 


In this controller state, the test data register con- 
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6.4.8 PAUSE-DR STATE 


The pause state allows the test controller to tempo- 
rarily halt the shifting of data through the test data 
register in the serial path between TDI and TDO. 
This might be necessary, for example, to allow the 
tester to reload its pin memory from disk during ape 
plication of a long test sequence. | 


The test data register selected by the current in- 
struction retains its previous state. The instruction 
does not change in this state. 


The controller remains in this state as long as TMS 
is LOW. When TMS goes HIGH and a rising edge is 
applied to TCK, the controller moves to the Exit2-DR 
state. 


6.4.9 EXIT2-DR STATE 


This is a temporary state. If TMS is held HIGH anda 
rising edge is applied to TCK, the scanning process 
terminates, and the TAP controller enters the 
Update-DR state. lf TMS is held LOW and a rising 
edge is applied to TCK, the. controller enters the 
Shift-DR state. 


The test data register selected by the current in- 


struction retains its previous state unchanged. The 


_instruction does not change in this state. 


nected between TDI and TDO as a result of the cur- | 


rent instruction shifts data one stage toward its serial 
output on each rising edge of TCK. 


The instruction does not change in this state. 
When the TAP controller is in this state and a rising 
edge is applied to TCK, the controller enters the 


Exit?-DR state if TMS is HIGH or remains in the 
Shift-DR state if TMS is LOW. 


6.4.7 EXIT1-DR STATE 
This is a temporary state. If TMS is held HIGH, a 


rising edge applied to TCK while in this state causes _ 


the controller to enter the Update-DR state, which 
terminates the scanning process. If TMS is held low 
and a rising edge is applied to TCK, the controller 
enters the Pause-DR state. 


The test data register selected by the current in- 
struction retains its previous state unchanged. The 
instruction does not change in this state. — 


6.4.10 UPDATE-DR STATE — 


The BSR register is provided with a latched parallel 
output to prevent changes at the parallel output 
while data is shifted in response to the EXTEST and 
SAMPLE instructions. When the TAP controller is in 
this state and the BSR register is selected, data is 
latched onto the parallel output of this register from 
the shift-register path on the falling edge of TCK. 
The data held at the latched parallel output does not 
change other than in this state. 


All shift-register stages in test data registers select- 
ed by the current instruction retain their previous 
state unchanged. The instruction does not change | in 
this state. | 


When the TAP controller is in this state and a rising 
edge is applied to TCK, the controller enters the 


-Select-DR-Scan state if TMS is held HIGH or the 
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Ftun-Test/Idle state if TMS is held LOW. 


6.4.11 CAPTURE-IR STATE 


In this controller state the shift register contained in 
the instruction register loads the fixed valle 0001 on 
the rising eae of TCK. _ 


Niel. 


The test data register selected by the current in- 


struction retains its previous state. The instruction 
does not change | in this state. 


When the controller i is in this State and a rising edge 
is applied to TCK; the controller enters the Ex/t7-/R 
state if TMS is held HIGH or the Shift-/R state if TMS 
is held LOW. 


6.4.12 SHIFT-IR STATE 


In this state, the shift register contained in the in- 
struction register is connected between TDI and 
TDO and shifts data one stage towards its serial out- 
put on each rising edge of TOK. 


The test data register selected by the current in- 
~ struction retains its previous state. The instruction 
does not change in this state. 


When the controller is in this state and a rising edge 
is applied’ to TCK, the controller enters the Exit7-/R 
state if TMS is held HIGH or remains in the Shift-/R 


State if TMS is heid LOWV. 


6.4.13 _EXIT1- IR STATE 


This is a temporary. state. lf TMS is held HIGH, a 
rising edge applied to TCK while in this state. causes 
the controller to enter the Update-/R state, which 
terminates the scanning process. If TMS is held low 
and a rising edge is applied to TCK, the controller 
enters the Pause-iF state. 


The test data register selected. by the current in- 
struction retains its previous state unchanged. The 
‘instruction. does not ‘change in this state, and me 
Instruction register ietains its state. 


aan PAUSE-IR on | 
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terminates, and the TAP controller enters the 
Update-/R state. If TMS is held LOW and a rising 
edge is applied to TCK, the controller enters the 
Shift-IR State. 


The test data register selected by the current in- 
struction retains its previous state unchanged. The 


instruction does not change in this state, and the 


instruction register retains its state. 


6.4.16 UPDATE-IR STATE 


The instruction shifted into the instruction register is 
latched onto the parallel output from the shift-regis- 
ter path on the falling edge of TCK. Once the new | 
instruction has been latched, it becomes the current 
instruction. 


Test data registers selected by the current instruc- 
tion retain the previous state. 


6.5 Boundary Scan Register Cell 


Ordering 


Figure 6.4 shows the order of sls) in ‘the BSR. 
There are 150 cells including TDO. TDI is nota BSR 
cell. 


- The DCTL, ACTL, TCTL, and OCTL cells do not cor- 


respond to pins of the i860 XP microprocessor; rath- 


er, they control the bidirectional and tristate pins: 


‘TCTL Tristate outputs: 


This state allows the shifting of the instruction regis- 


ter to be temporarily halted. 


The test data register selected by the current in- 
struction retains its previous state. The instruction 
does not change in this state, and the instruction 
register retains its State. 


‘The controller remains in this state as long as TMS 
is LOW. When TMS goes HIGH and a rising edge is 
applied to TCK, the controller moves to the Ex/t2-/R 
state. 


6.4.15. EXIT2-IR STATE 


This is a temporary state. If TMS is held HIGH and a 
rising edge is applied to TCK, the scanning process 


DCTL D63-D0, DP7—-DPO 


ACTL A31-A3 


ADS#, BE7#-BE0O#, 
CACHE#, CTYP, D/C#, KBO, KB1, LEN, 
M/IO#, NENE#, PCD, PCYC, PWT, W/R# 


OCTL Outputs not floated in normal. operation: 
BREQ, HIT#, HITM#, HLDA, LOCK#, 
PCHK # 


lf a value of one is loaded into any of these control 
latches, the associated pins will not drive the exter- 
nal bus while running EXTEST. 7 | | 


The values of DCTL, ACTL, TCTL, and OCTL are 
undefined during the SAMPLE instruction. 


2-92 


The values and direction of |/O and outputs do not 
change during the scanning process (that is, during 
Shift-DR states). They only change after Scanning is 
completed (in the Update-DF state). 7 


The decision table, Table 6.3, defines how the 
boundary scan instructions EXTEST and SAMPLE/ 
PRELOAD utilize BSR. 
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Figure 6.4. Boundary Scan Register Ordering 
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6.6 TAP Controller Initialization 7.0 MECHANICAL DATA 


TAP can be initialized by applying a high signal level Figures 7.1 and 7.2.show the locations of pins; Ta- 
on the TMS input for five periods of TCK or by acti- bles 7.1 and 7.2 help to locate pin identifiers. 
vating the TRST# input pin. TCK does not have-to | | 

be running in order to initialize TAP with the TRST # 

pin. TRST # is provided with an internal pull-up resis- 

tor; so, even if an open circuit fault occurs, the TAP 

logic can still be used. | 


Table 6.3. Instruction Functions | | | : _ 
a [tow | High | tow | HIGH 


| Input BSR cells... | ... sample values driven to ... Sample values driven to 
| processor by system. processor by system. 

Values of input cells 7 | 

NO | 
used by processor? ed 
Output BSR cells... ... drive output pins with © | ... sample values driven 
ae | ; cell values re by processor 
- Treatas © _ Treat as ‘Treat as ‘Treatas 
output input — output input 


Input/output BSR cells: | 
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Table 7.1. Pin Cross Reference by Location 


Location Signal Location Signal Location en Location so 
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Table 7.1. Pin Cross Reference by Location (Continued) 


~ Location . Signal Location Signal Location Signal Location _ Signal 


CACHE # 
CLK | 
CTYP 
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Table 7.2. Pin Cross Reference by Pin Name (Continued) 


Signal Location _ Signal Location Signal Location Signal _ Location 


EADS # Vss 
FLINE # Vss 
HIT # | Vss 
HITM# : Vss 
HLDA | Vss 
HOLD | | Vss 
INT/CS8 “| Vss_ 
INV Vss 
KBO 7 Vss 
KB1 | | Vss 
KEN # | Vss 
LEN Vss 
LOCK # | Vss 
M/AlO# | Vss 
NA# . | | | | Vss 
NENE # | Vss 
PCD | Vss 
PCHK # 15. Vss 
PCYC | | | | Vgg - 
PEN # | | Vss 
PWT } Vss 
RESET | | Vss 
SPARE Vss 
EWBE # Vss 
BYPASS # | | Vss 
TCK Vss 
TDI | Vss 
TDO Vss 
TMS : Vss 
TRST # Vss 
Voc Vss 
Voc Vss 
Voc W/R# 
Voc WB/WT # 
Voc | 

Voc 
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Table 7.3. Ceramic PGA Package Dimension Symbols 


Description of Dimensions | 


Distance from seating plane to highest point of cae 
Distance between seating plane and base plane (lid) | 


- Distance from base plane to highest point of body 


~ Distance from seating Jaa to bottom of body 


_ Linear spacing between true lead position eemenines:. 


NOTES: os 
1. Controlling siesta millimeter. 
2. Dimension ‘‘e;” (“‘e’’) is noncumulative. 


3. Seating Nahe. (standoff) is defined by P.C. board hole size: 0.0415- 0. 0430 inch. 
4. Dimensions “B”, “By”, and “C” are nominal. : 
9. Details of Pin 1 identifier. are optional. 
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SWAGGED | 
PIN 
DETAIL 
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Solid Lid 
Solid Lid 


Max 
.180 


BASE PLANE 
140 
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> 
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) Solid Lid 


SWAGGED 


Family: Ceramic Pin Grid Array Package 
Millimeters 


Figure 7.3. 262-Lead Ceramic PGA Package Dimensions 


45° CHAMFER 
(INDEX CORNER) 
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8.0 PACKAGE THERMAL 
‘SPECIFICATIONS 


For this section, let: 3 
P = maximum power consumption 
Tc = case temperature 
Ta = ambient air temperature 
@ca = thermal resistance from case to ambient air 
0jc = thermal resistance from junction to case 


Oj, = thermal resistance from aunetion to ambient 
air | 


‘The i860 XP microprocessor is specified for opera- 
tion when T¢ is within the range of 0°C-85°C. Tc may 
- be measured in any environment .to determine 
~ whether the i860 XP ‘microprocessor is within speci- 
fied operating range. The case temperature should 


be measured at the center of the top surface oppo- — 


site the pins. 


Ta can be calculated from @ca with the following 
equation: 


Ta = To — P* 9CA 


_ 1860T™ XP. MICROPROCESSOR 


PRELIMINARY 


Typical values for Oca at various airflows and for O5c 
are given: in Table 8.1 for the 1.95 sq. in., 262 pin, 


ceramic PGA. @jc is shown so that 0), can be cal- 
culated by: 


Oya = 9c. — 9c 


~ Note that ig with a heatsink differs from BC with- 
- out a heatsink because case temperature is mea- 


sured differently. Case temperature for @jc with 


. heatsink is measured at the center of the heat fin 


base. Case temperature for @jc without heatsink is 


; measured at the center of the package top surface. 


Table 8.2 shows the maximum TA allowable (without 


exceeding Tc) at various airflows. 


~ Note that TA. is greatly improved by attaching ‘“‘fins”’ 


or a.“heat sink” to the package. P (the maximum 


-- power consumption).is calculated by using the maxi- 


mum Icc at 5V as tabulated in the D.C. Characteris- 


tics of section 9. 


Figure 8.1 gives typical Icc derating with case tem- 
perature. For more information on heat sinks, mea- 


- surement-techniques, or package characteristics, re- 
fer to Intel Packaging Handbook, order number 


240800... 


Table 8.1. Thermal Resistance—In °C/ Watt 


With Heat Sink* 
‘Without Heat Sink — 


NOTE: 


_ 0ca as a Function of Airflow — ft/min (m/sec) | 


* Nine-fin, unidirectional heat sink (fin dimensions 0. 250” height, 0.040” fin width, 0.100” center-to- center SPACING, 1. 730" 


length). 


Table 8. 2. Maximum Ta, at Various Airflows—in °C) 


Ta with | 
Heat Sink* 


Ta without 
Heat Sink 


NOTE: 


- Airflow — ft/min (m/sec) 


* Nine-fin, unidirectional heat sink (fin dimensions: 0.250” height, 0.040” fin width, 0.100” center-to-center spacing, 1.730” 


length) 
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30 40 50 60 


TEMPERATURE (Degrees Centigrade) 
240874-67 


Figure 8.1. l¢¢ Derating with Case Temperature 


9.0 ELECTRICAL DATA | 
All input and output timings are specified relative to 


the 1.5V level of the rising edge of CLK and refer to 
the point that the signal reaches 1.5V. 
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9.1 Absolute Maximum Ratings NOTICE: This data sheet contains preliminary infor- 
mation on new products in production. The specifica- 
Case Temperature Tc under Bias ...... 0°C to 85°C tions are subject to change without notice. Verify with 


Storage Temperature .......... —65°C to + 150°C your local Intel Sales office that you have the latest: 
_ data sheet before finalizing a design. 


Voltage on Any Pin with | — 
Respect to Ground..... oe... 0.5 to Voc + 0.5V *WARNING: Stressing the device beyond the “Absolute 
Maximum Ratings” may cause permanent damage. 

These are stress ratings only. Operation beyond the 

“Operating Conditions” is not recommended and ex- 

tended exposure beyond the “Operating Conditions” 


may affect device reliability. | 


9,2 D.C. Characteristics ! 
Table 9.1. D.C. Characteristics Operating Conditions: Voc = 5V +5%; Tc = 0°C to 85°C 


Output HIGH voltage (TTL 
Power supply current (@ 50 MHz) 
( 


|/O or output capacitance 


NOTES: 

1. This parameter is measured with current load of 5 mA. 

. 2. This parameter is measured with current load of 1 mA. Typical value is Vcc — 0.45V. 

3. Measured at 50 MHz and Vcc = S5V. 

4, This parameter is for inputs without pullups. Vcc is on, and OV < Vin < Voc. 

5. This parameter is for inputs with pullups and Vi, = 0.45V. Note that if the pull-ups are put in high-impedance state via the 
DCTL boundary scan cell that also tri-states the data outputs, then the leakage is +15 pA. 

6. 0.45V < Vin < Voc — 0.45V. | 

7. These parameters are not tested; they are guaranteed by design characterization. 
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9.3 A.C. Characteristics 


Table 9.2. A.C. Characteristics _ 
-C_ = 0 pF Unless Otherwise Specified; Voc = 5V +5%; Tc = 0°C to 85°C 


40 MHz 


Min Max 
(ns) (ns) 


0 
1000 
0.1% 


Parameter 


CLK Period 


Symbol Alin 


( 


Be 


ah 


NO 
oO 


ome ice) 
NH | — 


TCK Period 
CLK Stability 


40 


oO | oO 
— | 


CLK High Time 
CLK Low Time 
CLK Rise Time 


c 
ooh 


<e) 
= 


<e) 
— 


CLK Fall Time 
TCK to CLK Skew 


tc 
ttc 
i tch 
tcl 
{tr 
tf 
ts 


OO 
G@ 


ttch TCK High Time 


25 
40 
7 
7 
10 
10 


ttcl TCK Low Time 
ttcr TCK Rise Time 
ttcf | TCK Fall Time 


| tsu.1 RESET, HOLD, BERR, FLINE#, 
PEN #, INT/CS8 Setup Time be 
tsu.2 BOFF#, AHOLD, KEN#, NA#, 
INV, WB/WT # Setup Time 


cO 
NO 


—" 


: — hITN N ; ; — 3 


co 
NO 


<o) 
ine) 


i<o) <o) 
—_ — 


co 
ine) 
© 
ol 


tsu.3 EADS # Setup Time 
tsu.4 EWBE# Setup Time . 


Oo;o]o 
ok ak ek 


8 7.5 


tsu.5 BRDY # Setup Time ; 
| tsu.6 D63-D0, DP7-DPO Setup Time 


tsu.7 D63-—D0, DP7—DP0 Setup Time if 
(Late Backoff Mode) s 


© 
oi 


5 


co) 


<o) oO} © 
— ee jee 


tsu.8 A31-A5 Setup Time 
ttsu TDI, TMS, TRST # Setup Time : 
TDI, TMS, TRST # Hold Time 


<o) 
NO 


tt 


h 
th.2 


th.1 Hold Time, All Inputs 
except D63-—D0, DP7-D0 
D63-D0, DP7—DPO Hold Time , 
(Normal and Late Back-Off Mode) 

ttco _| TDO Valid Delay and All Outputs ; 

| Valid Delay in EXTEST Mode 

tco.1 A31-A22 Valid Delay 

tco.2a A21-A3 Valid Delay 
(High Current Mode) 

tco.2b A21-A3 Valid Delay ; 
(Normal Current Mode) 


—k 

oO 
£ 

as 


x ae 
a o 
Ol 


12 
11.5 


1.5 
1.5 


11 


1 10.5 


: 
11 
2 
2 
; 
1.5 
1.5 

5 
1.5 


2 


— 
on 


c oO} ©o co © Oo };O 
—_ —_— | — ND — ws | NO 
— 

Ro Pb 
oO 
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Table 9.2. A.C. Characteristics (Continued) | | : 
C,. = 0 pF Unless Otherwise Specified; Voc = 5V +5%; Tc = 0°C to 85°C 


D63-D0, DP7-DPO Valid Delay | 
| tco.4 BREQ, HLDA, PCHK#, — | 1g, 
NENE#, KBO, KB1 Valid Delay 
| tco.5a | ADS# ValidDelay a 
ae (High Current Mode) 

tco.5b | ADS# Valid Delay | 94) 15 } 11 | 15 | 10 

(Normal Current. Mode) | . . 

tco.6a W/R# Valid Delay fot | 45] 140] 4.5 4. 10 ) 

| (High Current Mode) . | | 


Parameter 


2.5 
1.5 


co 
— 


co 
— 
— 
ide) 
= 
oO 
— 
NO 


© CO 
— — 


c 
ae 


| teo.6b | W/R# Valid Delay ~ 15 | 12 | 15 | 11 
, (Normal Current Mode) - | 

| tco.7a HITM# Valid Delay 2 1.5 12 1.5 11 

(High Current Mode) Looe 


tco.7> | HITM# Valid Delay F 
(Normal Current Mode) | 
tco.8 PWT, PCD, HIT#, CTYP, D/C# M/IO#,. 
| PCYC, LOCK #, CACHE #, LEN Valid Dela 
BEO # -BE7 # Valid Delay | 
| tco.9b | BEO#-BE7# Valid Delay 
fie | (Normal Current Mode). 
‘| tz. 7 | 


(High Current Mode) - 
t Float Time All Outputs 


y F 


— 
6) 
—k 
NO 
ar 
on = 
ek 


except D63-D0, DP7-DPO 
Cae Float Time:D63-D0, DP7-DPO | | 
ptt Float Time during Boundary Scan EXTEST | 
NOTES: 


a. Minimum and maximum delays are for OpF load. ee 

b. These hold times are referenced to the falling edge of TCK. 

c. These hold times are referenced to the rising edge of CLK. | : . 

d. Output delay for D63-D0, DP7-—DPO is from the CLK after ADS# activation. ey a 

e. Float time = delay until maximum output current is less than +l_o. Float time is not tested. - - 

f. Delay from falling edge of TCK. . _ | qe 2 

g.. These pins can be configured as normal or high-current buffers. When they are configured as. high-current buffers for 
interface with cache memory or other large loads, use the derating curves in Figure 9.3. Otherwise, all normal buffers use 
the derating curves in Figure 9.4. | | PAE ietlgs ose Y God 

h. tr and tf should be measured between 0.8V and 2.5V. 

i. Assumes TCK and CLK both at 25 MHz. x 


woh 


oy}o}; oOo]. oF wo] © 
wk fee pr © 
el aerl = 

Oo; .. 
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OUTPUTS k---- -- 
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_ Figure 9.1. CLK, Input, and Output Timings 


ttc 


TCK 


240874-69 


Figure 9.2. TAP Signal Timings 
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PEt - 


ow MAXIMUM VALID DELAY |) 


* DELAY (ns) VS. NOMINAL 


LOAD () 240874-70 


NOTES: 

Graphs are not linear outside the C, range shown. 
NOMINAL = OpF value given in the A.C. Timings table. 
*Typical part under worst-case conditions. 


Figure 9.3. Typical Output Delay vs Load Capacitance 


All Guiputs mm Normal Mode) 


 * DELAY (ns) VS. NOMINAL 


LOAD (pF) 240874-71 


NOTES: 

Graphs are not linear outside the CL range shown. 
NOMINAL = 0 pF value given in the A.C. Timings table. 
*Typical part under worst-case conditions. 


Figure 9.4. Typical Output Delay vs Load Capacitance 
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ALL OUTPUTS 
(In Normal Mode) 


TYPICAL* SLEW TIME (ns) (0.8V-2.0V) 


0 25 50 75 100 125 150 175 200 225 £250 


LOAD CAPACITANCE, C, (pF) 
240874-81 


NOTES: 
Graphs are not linear outside the C,_ range shown. 
*Typical part under worst-case conditions. 


Figure 9.5a. Typical Slew Time vs Load Capacitance under Worst-Case Conditions (Rising Voltage) 


ALL OUTPUTS 
(In Norma! Mode) 


~/ 
—— 


TYPICAL* SLEW TIME (ns) (2.0V-0.8V) 


DS#, A21-A3, BE7#-BEO#, W/R#, HITM# 7 
(In High Current Mode) 


100 125 150 175 200 225 250 


LOAD CAPACITANCE, C, (pF) 
240874-82 


NOTES: 
Graphs are not linear outside the C_ range shown. 
*Typical part under worst-case conditions. 


Figure 9.5b. Typical Slew Time vs Load Capacitance under Worst-Case Conditions (Falling Voltage) 
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os 
a 
E 
< 
w 
oO 


35 40 
FREQUENCY (MHz) = 
| 240874-73 


NOTES: | . 
Graph is not linear outside the frequency range shown. 


“Worst-case supply current at 5V. 


Figure 9.6. Typical Icc vs. Frequency 
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9.4 Component Buffer Model 


9.4.1 FIRST ORDER ELECTRICAL BUFFER 
- MODEL : 


The first order electrical buffer model provides an 
accurate and simple representation of the buffers 
used in the inputs and outputs of the CHMOS i860 
XP CPU. The model SUIPHEE consists of four compo- 
_ nents: 


1. Linear voltage waveform (dV/dt) 
2. Intrinsic buffer delay due to Cy (to) 
3. Buffer output impedance (Ro) 

4. Buffer output capacitance (Co) 


as shown in Figure 9.7a 


A fitting algorithm has been used to arrive at values 
for dV/dt, tg, Co, and Ro such that Ro matches the 
actual buffer impedance and Co, the intrinsic buffer 
output capacitance whether the output is on or off, 
remains constant across the operating range while 
minimizing the difference between the full buffer cir- 
cuit and its simplified electrical model for a set of 
different loads (lumped capacitance, and short and 
long transmission lines). dV/dT is the slope of the 
voltage ramp, while t, is the intrinsic buffer delay 
associated with a given C,. tg accounts for the intrin- 
sic delay by offsetting the excitation of the model by 
the amount of the delay. 


NOTE: | 
to is zero for C, = O and when the load is repre- 
sented by a transmission line. 


dV/dt U(tty) CP) Co 


240874-83 


Figure 9.7a. Output Model 


The input model consists of one component, buffer 
capacitance (Cjj), as shown in Figure 9.7b. 
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~ 


r 


Figure 9.7b. Input Model 
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9.4.2 FIRST ORDER ELECTRICAL MODEL 
PARAMETER VALUES 


The parameters that make up the first order electri- 
cal model vary with the buffer design. In addition, 
these parameters also vary with the operating condi- 


tion (i.e., temperature and Vcc) of the buffer pro- = 


cess. The typical process corner is being modeled. 
Two sizes of buffer are used on these components, 
labelled here as small and large. The parameter val- 
ues found in Table 9.3 and 9.4 list dV/dt, tp, Ro, and 
Co. These parameters are provided for both low-to- 
high and high-to-low transitions at the typical pro- 
cess corner for three operating conditions (Vcc = 
5.5V and Ty =: —10°C, Vcc = 5.0V and Ty = 80°C, 
and Vcc = 4.5V and Ty = 125°C. 


9.4.3 PACKAGE PARAMETERS 


In addition to the buffer characteristics, package 
characteristics are also included to complete: the 
model. Package inductance, capacitance and resist- 


ance values vary with design geometry and material 


properties of the package. Figure 9.8 shows a model 
of the package including these parameters and 
should be placed between the first order electrical 
buffer model as shown in Figure 9.9 and the board 
interconnects. Notice the package model only in- 
cludes the package inductance (Lp) and capaci- | 
tance (Cp). This is sufficient since the package re- 
sistance is so small it is negligible. 


Table 9.5 lists the buffer model parameters for each 
pin of the i860 XP microprocessor. The table gives 
the package model parameters for each pin, fol- 
lowed by the input capacitance (input and !/O pins) 
and/or output buffer size (outputs and I/O). In those 
cases where the buffer used by a pin is an option 
selected at reset by the PEN # input, the output buff- 
er column lists the sizes available. Large buffers cor- 
respond to high-current mode, while small buffers 
correspond to normal current mode. 
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9.4.4 BOARD INTERCONNECTS 


The board interconnect can be considered as a 
lumped parameter (capacitive load) or as a transmis- 
sion line. As a rule of thumb, an unterminated board 
interconnect may be considered as a capacitive load 
if the round trip time (time for signal to travel from _ 
one end of the interconnect to the other and back) is 
short compared to the transition time of the signal. 
At frequencies of 50 MHz and above most intercon- 
nects behave as transmission lines (Figure 9.10). 
For accurate results. at high frequencies, these 
transmission line effects must be taken into account 
' and modeled. 


240874-85 


Figure 9.8. Package Model 


240874-86 


Te 


240874-88 


in = 


240874-87 


Figure 9.9b. Input Buffer and Package Model © | Figure 9.10. Transmission Line Model 
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Table 9.3. Small Output Buffer First Order Electrical Model Parameter Values 


oo T R C ty (ns) at various Cy 
Transition J 0 O | dV/dT 
(C) | (ohms) | (pF) FARA 25 100 
(pF) (PF) ecneia 
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Table 9.4. Large Output Buffer First Order Electrical Model Parameter Values 
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Table 9.5 Buffer Models 


Input 
Buffer 
Cin (pF) 
Typical 


Output 
Buffer 
Size 
(Large or Small) 


Cp (pF) 
Typical Typical 


Lp (nH) 


Pin Name Location 


S01 


ADS# No4 
AHOLD Q05 
BEO# 

 ~BEI# 
BE2# 


oO) 
Pp 


L/S 
05 
05 


L/S 
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L/S 


~ 
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ae | tos ip ss Pod 

ae | soa ip se] oda 

ae | cep eo | ee | er Ps 

[ae | ae [7s | erp ae if 

Aer [eee dee | PS 
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Table 9.5. Buffer Models (Continued) 


Input Output 
Cp (pF) Lp (nH) » Buffer Buffer 
Typical Typical Cin (PF) Size 
Typical (Large or Small) 


Location 


Pin Name 
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Pin Name 
| Typical 
Dao 


oe | owe ae tor 


Table 9.5 Buffer Models (Continued) 


Input Output 
— Cp (pF) =| Lp (nH) Buffer Buffer 
Location Cin (pF) Size 


(Large or Small) 
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Table 9.5 Buffer Models (Continued) 


| Input Output 
Cp (pF) Lp (nH) . Buffer Buffer 
Typical. Typical | Cin (pF) Size 
| : Typical (Large or Small) 
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Output 
Cp (pF) Lp (nH) Buffer 
Typical Typical Size 
(Large or Small) 
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10.0 INSTRUCTION SET 


Key to abbreviations: 


For register operands, the abbreviations that de- 


scribe the operands are composed of two parts. The 
first part describes the type of register: 


c One of the control registers fir, psr, epsr, 
dirbase, db, fsr, bear, ccr, p0, p1, p2, or p3- 


f | : One of the floating-point registers: f0 moa) | 


31 
i One of the integer registers: r0 through r31 


' The second part identifies the field of the machine 


instruction into which the operand is to be placed: 


src? The first of the two source-register desig- - 


nators, which may be either a register or a 


16-bit immediate constant or address off- 


set. The immediate value is zero-extended 
for logical operations and is sign-extended 
for add and subtract operations (including 
addu and subu) and for all adaressiig. cal- 
culations. 


srcini Same as src? except that no immediate 


constant or address offset value is gealllle fe 


ted. L 
Same as src? except that the immediate 


constant is a 5-bit value that is ZETO-EX- 
tended to 32 bits. 


src1s 


src2_The second of the two source- register des- | 


| ignators. 
dest ‘The destination register designator. _ 


Thus, the operand specifier /src2, for example, 
means that an integer register is used and-that the 
encoding of that register must be placed in the src2 
field of the machine instruction. 


Other (nonregister) operands are specified by a one- 
part abbreviation that represents both the type of 
operand required and the instruction field into which 
the value of the operand is placed: 


#const A 16-bit immediate constant or address off- 
set that the i860 XP microprocessor sign- 
extends to 32 bits when computing the ef- 
fective address. 


lbroff A signed, 26-bit, immediate, relative branch 
offset. 

sbroff Assigned, 16-bit, immediate, relative branch 
offset. 


i860T™ XP MICROPROCESSOR 


PRELIMINARY 


brx A function that computes the target ad- 
| dress by shifting the offset (either /bro/f or 
_sbroff) \eft by two bits, sign-extending it to. — 
_ 32 bits, and adding the result to the current 
instruction pointer plus four. The resulting 
target address may lie anywhere within the 

address space. . 


Table 10.1. Precision Specification _ 


ee Source Precision | Result Precision 


single single. 
double 
double 
single © 


_ single 
double 


double 


‘Unless otherwise specificed, floating-point operations ac- 


cept single- or double-precision source operands and pro- 
duce a result of equal or greater precision. Both input oper- 
ands must have the same precision. The source and result 
precision are specified by a two-letter suffix to the mne- 
monic of the epeeton 


Other abbreviations include: 


.Pp | : Precision specification SS, 
| sd, or .dd (.ds not permit- 
ted). Refer to Table 10.1. 


a | _ Precision specification .ss, 

_.Sd, .ds, or .dd. Refer to 

| Table 10.1.. 

Vv sd or .dd Refer to Table 
? | 40.1. 

wo - ‘Ss or .dd. Refer to Table 

. - 10.1. 

X -b (8 bits), .s (16 bits), or .1 
: (32 bits) 

.y (32 bits), .d (64 bits), or 

-G (128 bits) , 
mem.x(address) The memory location indi- 


cated by address with a 
size of x. 


The I/O port indicated by 
address with a size of x. 


int_.vector.x(address) The interrupt vector with a. 
size of x returned from 1/O © 
port address. 


port.x(address) 


‘PM | The pixel mask, which is 


considered as an array of 
eight bits PM(7)..PM(0), 
where PM(Q) is the least- 
significant bit. 
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10.1 Instruction Definitions in Alphabetical Order 


adds (scl, ISICZ.10@ST occ eo ocd cine sad ewe bie se seuss wanes Seopa ek ats Bud ey wnat dee ania Add Signed 
idest <— isrc?t + isrc2 : : . 
OF < (bit 31 carry bit 30 carry) 
CC set if isrc2 + isrc? < 0 (signed) 
CC clear if isrc2 + isrc? = 0 (signed) 


AUGUISICT ISICZ TOOSE 6 602k ca 8 ose Ba COMSRER ALA CONS TARO RERERS ee oe Sussaere eiernarts _.Add Unsigned 
idest <— isrc?t + isrc2 
OF <— bit 31 carry 
~ CC < bit 31 carry 


SNGISICT, ISICZ 10CST cw kucc au cuy Ones se oteg LER eeMe SSSR Re Oe ede aala SRE hen Logical AND 
idest <— isrc? and isrc2 
CC set if result is zero, cleared otherwise 


andh #const (S102, IdOSt 6... cc0c0ccloteccccccccccececececesesesecs be tleasd enaeort Logical AND High fi; 
idest <— (#const shifted left 16 bits) and /src2 : : : 
CC set if result is zero, cleared otherwise 


andnot /src/, isrc2, idest ........ Gee taedes aus Agta a Rest eaathaes ws atacst anandintata wie Gaeta Logical AND NOT 
idest <— (not /src7) and isrc2 | 
CC set if result is zero, cleared otherwise 


andnoth #const isrc2, idest..... tak vecdincetots tO. ce havea a eS st eae ecite mare etn ey Logical AND NOT High 
idest <— (not (#const shifted left 16 bits)) and isrc2 
CC set if result is zero, cleared otherwise 


DOJO 3363 wb Sete ee etiam cadee oe de chee le Seen ehh demhetignted wb hee amaaie ste Gees Branch on CC 


IF CC = 1 
THEN continue execution at brx(/broff) 
Fl | 
be. t/broff ....... Eee ernie atone mele Sh ease es ee eel pone Ue neh oe Bae eeeee Branch on CC, Taken 
AF CC = -" 
THEN execute one more sequential instruction 
continue execution at brx(/broff) 
ELSE skip next sequential instruction 
Fl 
DIAISICTNL ISICZ. SDIOI 59555 06.8 SES eh ak We OS oe OSE AER SAS Eh OR HIS Branch on LCC and Add 


LCC-temp clear if isrc2 + isrcini < 0 (signed) 
LCC-temp set if isrc2 + isrc1ni = 0 (signed) 
Isrc2 <— isrcini + isrc2 

Execute’ one more sequential instruction 


IF LCC 
THEN LCC <— LCC- -temp 
continue execution at brx(sbroff) 
ELSE LCC <— LCC-temp 
DNC 1OION 2205 56he eke oct isokt deeb ewhaeesawewenteee sel ee ee ee eee: eee B ranch on Not CC 
IF . CC =0 += so “2 
THEN _ continue execution at brx(lbroff) 
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DNC IDIOM eects colts canoe ane HERE CoB ieee gear re: SH PRENLEM NOLES: Taken 


IF CC =0 3 
THEN execute one more sequential instruction 
continue execution at brx(/broff) 
ELSE skip next sequential instruction 
Fl 
BRIBE seca h aise ve duadmsind Mhaaesiavaleeaneunna avd eae edema '...Branch Direct Unconditionally 


Execute one more sequential instruction. 
Continue execution at brx(lbroff). 


bri [isrctni] ............- Sopcast Soatsto ina carcass Branch Indirect Unconditionally 


- Execute one more sequential instruction 
IF any trap bit in psr is set 
THEN copy PU to U, PIM to IM in psr 
clear trap bits | 
IF | DS is set and DIM is reset 
THEN _ enter dual-instruction mode after executing one 
| instruction in single-instruction mode . 
ELSE IF DS is set and DIM is set ; 
THEN enter single-instruction mode after executing one 
instruction in dual-instruction mode » 
ELSE IF DIM is set 
THEN ____ enter dual-instruction ode 
for next instruction pair 
ELSE _ enter single-instruction mode 


for next instruction pair . 
Fl | a 
Fl. 
ee 2 | , : 

Fl | : 
Continue execution at address in isretni 

(The original contents of /src7ni is used even if the next instruction 

modifies ste TOI: Does not eee if src7ni is misaligned.) 


bte isrc1s, isre2 ebioh eee psn he cma eeu oaeahaes eareaterd waists Bd sieteeeees ,...Branch If Equal 
IF isrc1s = isrc2 7 a 9% ! 
THEN continue execution at brx(sbroff) 
Fl 

Din@1s/C1S, 1SICZ, SDIOME 620. o4 ex kiosie Laheed Bik sd ewe Ke eee Rees er errr or Branch If Not Equal 

gis. a: — Isre1s # tsrc2 : | 

THEN continue execution at brx(sbroff) 
Fl 

Call IDIOff o.oo ccc cece cece cu ceucteceuceuccuneuces re ee i Sie acento -Subroutine Call 


ri <~— address of next sequential instruction + 4 (or + 8 in dual mode) 
Execute one more sequential instruction 
Continue execution at brx(/broff) 


Cal SIC TON cxopihd veces tosh dens dahets eae avi nisnlok sata cans sadeseees Indirect Subroutine Call 
r1 <— address of next sequential instruction + 4 (or + 8 in dual mode) 
. Execute one more sequential instruction 
Continue execution at address in /sre7ni 
_ (The original contents of /src7ni is used even if the next instruction 
modifies /src7ni. Does not trap if /src7n/is misaligned. The 
register /src7ni must not be r1.) 


fadd.p fsrcl, fsrc2, fest ...... 6.0. cceseeceeues bs Jyh ita Stee deecueeten aieineee ren Floating-Point Add 
fdest <— fsrc1 + fsrc2 © 7 
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TAdAD (S167 1SIC2, TAOS 6s 85h s Se he ek RR GA Es SER RR A ee EN Cw Sees Add with Pixel Merge 
fdest <— fsrc? + fsrc2 (using integer arithmetic; 8-byte operands and destination) 
Shift and load MERGE register from fsrc? + fsrc2 as defined in Table 10.2 


TAUAZISICT, TSICL, IO OSE ooh heat oh aed ececaeia 8 WER REEE A HOE SOKAH CVRD ESA TRAN A ORR RA Add with Z Merge 
fdest <~ fsrc? + fsrc2 (using integer arithmetic; 8-byte operands and destination) 
Shift MERGE right 16 and load fields 31..16 and 63..48 from fsrc7 + fsrc2 


TAMOVE ISICT SOCST on.lnd oS hae k cee a eh Celene rene ties ons Gulod te eee Floating-Point Adder Move 
fdest < fsrct 


HAGGW 75/07, TSICZ, TOCSE og isons Ger FERRE LEE PARNER OE RRC ‘Uiidalgh Gata ae Long-Integer Add 
fdest <— fsrct + fsrc2 (2’s complement integer arithmetic) 


TISUDW ISIC 1, ISICZ IOCST. 83's on $4 oe oe eae a Nad ede Cee cies ee sae dea ewrs Long-Integer Subtract 
frdest <— fsrc1t — fsrc2 (2's complement integer arithmetic) 


WWGVISICL JOCST chaos tiedc ct retteale ks cawedatd ese ees Floating-Point to Integer Conversion 
fdest <— 64-bit value with low-order 32 bits equal to integer part of /src7 rounded 


Floating-Point Load 
NOW ISIC (ISICZ), JOCSE hod os Foe Bh a ee Sat oH e RTE A ade eens es MEAS ROA A (Normal) 
TIGVISICTUSICZ) FG TOCSE «Waits ee ena ea pe ites anne bt es agers Me whaal a aa oie (Autoincrement) 
fdest <— mem.,y (/src7 = isrc2) | 
IF autoincrement 
THEN isre2 <— isre? + isrc2 


FI 
| Cache Flush 
MUSH: CONSTUSIC?). 60.3: snn hs Bs one CO% oh ASG oka eae Ries eA MT Tee Cee PET Tee ee reer nee (Normal) 
TIUSH 7 CONSTUSICZ) Fe PS 415 4 oe hinned <ieeal nae oS Se VwG es heed ued wae east We bad Mek oa ee (Autoincrement) 
Write back (if modified) the line in data cache that has address (#const + isrc2) 
80860XR: and set tag value to (#const + isrc2). 
80860XP: and invalidate its virtual and physical tags. 
Contents of line undefined. 
IF autoincrement 
THEN /src2 <— #const + isrc2 
Fl 
imlow:-dd 1S/C7, (SCZ, (06ST bcd ce bee kh bas obewnrrecdeee Raker ste ee oesnes Floating-Point Multiply Low 
fdest <— low-order 53 bits of (fsrc7 mantissa < fsrc2 mantissa) 
fdest bit 53 <— most significant bit of (fsrc7mantissa x fsrc2 mantissa) 
fmov.r fsrc7, fdeSt 20... ccc cece nes Batata 3 atten ney ane aan Floating-Point Reg-Reg Move 
Assembler pseudo-operation 
fmov.ss fsrc7, fdest = fiadd.ss fsrc?, f0, fdest 
fmov.dd /fsrc7, fdest = fiadd.dd fsrc7, f0, fdest 
fmov.sd /src7, fdest | = famov.sd fsrc/, fdest 
fmov.ds /src/, fdest - = famov.ds fsrc?, fdest 
LMU D ISIC). TSICZ (OCS Eire corte ethane Ceri weet ewan eee Ne hee iw a eewui hea Floating-Point Multiply 
fdest <— fsrc? X fsrc2 | 
MNGD: czccmeveianes nega See ee ee fee ee ee ee ee Pere eee Floating-Point No Operation 


Assembler pseudo-operation 
_ fnop = shrd ro, r0, rO 
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1OMM1S/C7; [06ST 5a sn nerewusecnus voudewawdes ited aua aaa des aurea cues OR with MERGE Register 
fdest <— fsrc7-OR MERGE | 7 oe : e 3 : a 
MERGE <0 | 


CD. DISICZ TOCS Eng Si etetank de kne Orewa Cegu wane lea lade Re ewes SUH Floating-Point Reciprocal 
fdest <— 1/ fsrc2 with maximum mantissa error SORT i | , 

frsqr. p fsrc2, fdest We ate SP aeeonie.e a Sie e heuer ee ae eeeae es - . Floating-Point Reciprocal Square Root 
‘fdest <~ 1/ fsrc2 with maximum mantissa error < 2-7 | co 


| | , Floating-Point Store 
TSU IGESTISICIUSICZ) ontnctiwsdwenneswin-e ddawnes wamasare Sy ee heck 6 eau oeres Roceatooe totes .(Normal) 
fst.y fdest, isrcT(isrc2)+ +... cece cee eee Hy iano one Accent a See ars tnt eeannee (Autoincrement) 
mem.y (/src2 + isrc1) <— fdest | 
‘’ |F autoincrement . 
THEN /sre2 <— isrc? + isrc2 
Fl ; 


‘abge: a ee ee eT ee ee iecaeeseeesvssree.+.Floating-Point Subtract 
_faest <_— fsrct - fsrc2 


ftrunc. Vv fier (OCS acs sets Ootitie converse wees eet eee Dead a eed Floating-Point to ueger Conversion 
’ 'fdest <— 64-bit value with low-order 32. bits equal to integer ale of fsrc7 | | 


IMIRISICTAICEST 22542 sheanessed ata ne ise leita dire a ie Wiauar ose aatecha a Gas asec Bisa. Teankter F-P to Integer Register 


idest <— fsrcl1 


; fzchk ferc!, fsrc2, FOOSE Goh selects seta eases ee rs a Oe ee ssa 32-Bit Z-Buffer Check | 
' Consider the 64-bit operands as arrays of two 32-bit | oy 
fields fsrc71(1)..fsre1(0), fsrc2(1)..fsrc2(0), and fees) pen)" 
where zero denotes the least-significant field.” | 
PM < PM shifted right by 2 bits 
FOR i = Oto1 
DO bs | 
PM [i + 6] <— fsrca(i) < fsrc7(i) (unsigned) 
fdesti) <— ‘smaller of fsrc2(i) and fsrc7(i) 
OD 
MERGE <— 0 


fzchks fsrc7, fsrc2, fdest ..... Deg hc Seve nied eee os Se BR a A ee ees -.16-Bit Z-Buffer Check 
Consider the 64-bit operands as arrays of four 16-bit 
fields fsrc7(3)..fsrc71(0), fsrc2(3)..fsrc2(0), and fdest(3). fdest(0) 
_ where zero denotes the least-significant field. 
PM < PM shifted right by 4 bits 
FOR i = 0to3 
Do 2 Bim 4 
PM [i + 4] <— fsrc2(i) < fsre7(i) (unsigned) ~ 
fdesti) <— smaller of fsrc2(i) and fsrce7(i) 
“OD... 
| MERGE <_ 0 | 


ROVE 3 eS sk ees aria en eeonae ts a Sapna rans pecdere acs aene Software Trap on integer Overflow 


THEN generate trap with IT set in psr 
Fl 


ixfr ISICTO IOC Eis ek erating. ada ee Laas SOoa aes MEE ewe Meet Transfer Integer to F-P Register 
fdest <— isrctni | 
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Id.c csrc2, idest........ iad ieee edule Nan Sw Os Ure eRe See ae ae eas Load from Control Register 
idest <— csrc2 


Id.x iSO 1(iSCD), DOSE... occ ccc ccccccccceuuvuececees cans selee arrive Gia Marat deat teks rane te Load Integer 
idest <— mem.x (isrce7? + Isrc2) 


IGIDUXGISICZ, JOCSE sein tchgis odie eae as Buea ae goa en Le eae eee aad Load Interrupt Vector 
idest <— int_vector.x (isrc2) 
NOTE: Not available with the i860 XR CPU 


TIO 1S C2 1COSE oe hj hiv oh eh EAPO Ae ED ORES EAA Ao EE RS Load I/O 
idest < port.x (isrc2) | 
NOTE: Not available with the i860 XR CPU 


IOCK hc sca uiews Maa ccleth Tac acunaa ue tome EOI Oe Mareen eames Begin Interlocked Sequence 
Set BL in dirbase. . 
The next load or store that appears on the bus locks that location. 
Disable interrupts until the bus is unlocked. 


mov isrc2, idest ........... Se aie pa ree otha ata cin panei ve digi ee somata des Register-Register Move 
Assembler pseudo-operation 
mov /src2, idest = shi r0, /src2, idest 


MOV CONSIG2 IOOST on -2i0.2 bah 84 RG CANA a WR ORR ARISE RO TROT Constant-to-Register Move 
Assembler pseudo-operation gs 
when OxFFFF8000 < const32 < Ox8000... 
adds /%const32, r0, idest . 
otherwise ... 
orh h%const32, r0, idest 
or /%const32, idest, idest 


NOD: as ea eal ew ee ee eta ow ek ee rae eee ae Core-Unit No Operation 
Assembler pseudo-operation | 
nop = shi r0, ro, r0 


or /src7, ISTC2, IdeSt .. 6 ee Sei bisa gt tegrateta tc 8 eit ad Be Kote it ee eee Logical OR 
idest <— isrc? OR isrce2 : 
CC set if result is zero, cleared otherwise 


orh #const VSI CZ OOS oan aps cohen rig OG AG oe ile ah we Sivoo Dla ites 1 dice et ceo Sse esecne aa ie Logical OR high 
idest <— (#const shifted left 16 bits) OR /src2 | | 
CC set if result is zero, cleared otherwise 


piadd:PYS/el, 1SiCZ, [06S ho52 ieee aS ea diew Ua ye eee eehd MEER RSS Pipelined Floating-Point Add 
fdest <— last stage adder result 
Advance A pipeline one stage 
A pipeline first stage <— fsrc? + fsrc2 


Diadap(S/cl, 15/62, (OCS shan nae ne ead aan hee in aes Pipelined Add with Pixel Merge 
fdest <— last-stage graphics-unit result , By. ae 
last-stage graphics-unit result <— fsrc7 + fsrc2 

(using integer arithmetic; 8-byte operands and destination) : 
Shift, then load MERGE register from fsrc7 + fsrc2 as defined in Table 2 


pfaddz fsrc/, fsrc2, (OCS cist ohne mau Meteo weueenues Cow cuenteuhente tn meu Pipelined Add with Z Merge 
frdest <— last-stage graphics-unit result 
last-stage graphics-unit result <— fsrc? + fsrc2 
(using integer arithmetic; 8- byte operands and destination) | 
Shift MERGE right 16, then load fields 31..16 and 63..48 fromfsrc7 + fsrc2 
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pfam.p fs/c7, (S662, 1OEST 5 os ian eet onan O54 Aven eked sous Pipelined Floating-Point Add and ee 
fdest <— last stage adder result 
Advance A and M pipeline one stage (operands accessed before advancing pipeline) 
A pipeline first stage <— A-op1 + A-op2 
M pipeline first stage <— M-op1 x M-op2 


DIaMOVil 1S/CT, TOCST <3 oo puciteniektsaeueras eae beers oueusa hanes Pipelined Floating-Point Adder Move 
fdest <— last stage adder result | 
Advance A pipeline one stage 
A pipeline first stage <— fsrc7 


pfeq-p (SiC7, ISICZ, SOCST ke ong ae ehh OSG VG Gee ee eae es Pipelined Fioating-Point Equal Compare 
fdest <— last stage adder result : | | 
CC set if fsrc7 = fsrc2, else cleared 
Advance A pipeline one stage 
A pipeline first stage is undefined, but no result exceplolr occurs 


pfgt. BD ISICT,TSICZ, TOCST i350 bis cee CSSER WAAR Pada ER SS Pipelined Floating- Point Greater-Than Compare 
(Assembler clears R-bit of instruction) 
fdest <— ‘last stage adder result _ 
CC set if fsrc7 > fsrc2, else cleared 
Advance A pipeline one stage 
A ee first stage is undefined, but no result exception occurs 


ptiadd.w fsrct, fsrc2, fdest................ ee ener ieee ween eT Ten Pipelined Long-Integer Add 
fdest <— \ast-stage graphics-unit result 
last-stage graphics-unit result <— fsrc7 + fsrc2 (2’s saiaplsisent integer arithmetic) 


pfisub.w fsrc7, fsrc2, fdeSt. 6 cece e ene neneees : ripened Long-Integer Subtract 
fdest <— \ast-stage graphics-unit result 
last-stage graphics- -unit result <— fsrc7 — fsrc2 (2’s complement integer arithmetic) 


DUXMVISICT, [0CST 3.03 nate oid eines Meee eee sucees Pipelined Floating-Point to Integer Conversion 
tdest <— last stage adder result | | 4 
Advance A pipeline one stage | 
A pipeline first stage <— 64-bit value with low-order 32 bits 
equal to integer part of fsrc7 rounded 


Pipelined Floating: Potn Load 


pfid.y /src71(isrc2), fde@St.. cc cee Niet sate ease ae eeues eee aes arene ,....(Normal) 
pfid.y isrc7(isrc2) + +, (dest... ccc ccc ence teens Siatnaitaeas Pica eed awnt (Autoincrement) 

fdest <— mem.y (third previous pfld’s (/src7? + isrc2)) 

(where .y is precision of third previous pfid.y) 

IF autoincrement 

THEN /src2 <— isrc? + isrc2 

Fl | 
NOTE: pfid.q is not available with the i860 XR CPU 


pfie.p fsrel, ISICZ IOS! a, 83 eck ese n eh oan nando Raa Pipelined F-P Less-Than or Equal Compare 
Assembler sets R-bit of instruction | 
fdest <— last stage adder result 
CC clear if fsrc7 < fsrc2, else set 
- Advance A pipeline one stage | 
A pipeline first stage is undefined, but no result exception occurs 
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pfmam.p /src7, fsrc2, fdeSt. 6... cece eens Pipelined Floating-Point Add and Multiply 
fdest <— last stage multiplier result 
Advance A and M pipeline one stage (operands accessed before advancing pipeline) . 
A pipeline first stage <— A-op1 + A-op2 
M pipeline first stage <~ M-op1 x M-op2 


DIMOV.E S167, TOSS od on hacks a rn ph tae ane SETS eR Oaloee Pipelined Floating-Point Reg-Reg Move 
Assembler pseudo-operation | ; 
pfmov.ss /src?7, fdest = pfiadd.ss /fsrc7, f0, fdest 
pfmov.dd /fsrc7, fdest = pfiadd.dd /src7, f0, fdest 
pfmov.sd fsrc7, ‘dest = pfamov.sd fsrc7, fdest 
pfmov.ds /src7, fdest = pfamov.ds fsrc7, fdest 


pimism.p 7s/c7, (SIC2, 10CST sss eo en eahaa eee de eialowns Pipelined Floating-Point Subtract and Multiply 
fdest <— last stage multiplier result 
Advance-A and M pipeline one stage (operands accessed before advancing pipeline) 
A pipeline first stage <— A-opi — A-op2 
M pipeline first stage <— M-op1 <X M-op2 


OTMULD (S101 ISIC. TOES icc w ra ade 08 Wea NAG ROE A SAS Pipelined Floating-Point Multiply ae 


fdest <— last stage multiplier result 
Advance M pipeline one stage 
M pipeline first stage <— fsrc? xX fsrc2 


pimul3:dd (SiC1, 1S/CZ JOOS) sb aickaiscs Suvsataies Wee RGN STG RE ES ERO Res Three-Stage Pipelined Multiply 
fdest <— last stage multiplier result | 
Advance.3-Stage M pipeline one stage 
M pipeline first stage <— fsrc? x fsrc2 


DION ISIC1.1OCSI 55s ou as ey alee ded eho ee tans oho Ns 4 weseeeeee Pipelined OR to MERGE Register 
fdest <— last-stage graphics-unit result | , 
last-stage graphics-unit result <— fsrc7 OR MERGE 
MERGE <— 0 


pism:) fsrcl, fsiC2. 1deSU cox hia ek ee 664 EEA eS Pipelined Floating-Point Subtract and Multiply 
fdest <— last stage adder result 
Advance A and M pipeline one stage (operands accessed belole advancing pipeline) 
A pipeline first stage <— A-op1 — A-op2 
M pipeline first stage <- M-op1 x M-op2 


pfsub.p fsrc7, fsrc2, fdest.... 0. ce eee ene ni eidanas . Pipelined Floating-Point Subtract 
fdest <— last stage adder result 
Advance A pipeline one stage 
A pipeline first stage <— fsrc? — fsrc2 


OMUNC:V: ISIC, 1OCST ons cod5 22s eteweceeyo ke sontet idee Pipelined Floating-Point to Integer Conversion 
fdest <— last stage adder result | . 
Advance A pipeline one stage 
A pipeline first stage <— 64-bit value with low-order 32 bits 

~ equal to integer part of fsrc7 3 


2-127 


intel. i860™ XP MICROPROCESSOR PRELIMINARY 


pizenksrcl, ISIC2, 10OSE 2 ices sie See Rw Oe CRA BEERS als CORR RR LE wae sPipeniee 32-Bit Z-Buffer Check 

Consider the 64-bit operands as arrays of two 32-bit | 
fields fsrce7(1)..fsre1(0), fsrc2(1)..fsrc2(0), and fdest(1). fdest(0) 
where zero denotes the least-significant field. 

PM < PM shifted right by 2 bits 

FOR i = 0 to 1 

DO 
PM fi + 6] <— fsrc2ti) - < fsrctli (unsigned) 
fdest(i) <— last-stage graphics-unit result 
last-stage graphics-unit result <— smaller of fsrc2(i) and ‘fsret 

OD 

MERGE < 0 


pfzchks fsrc7, fsrc2, fdest............... 0b Daten lakers ‘eed Wee dia weReiig eee Pipelined 16-Bit Z-Buffer Check 
Consider the 64-bit operands as arrays of four 16-bit | 
fields fsrc7(3)..fsre1(0), fsrc2(3)..fsrc2(0), and fdesit(3). fdest(0) 
where zero denotes the least-significant field. 
PM <— PM shifted right by 4 bits 
FOR i = 0 to 3 
DO. 
PM li + 4] - <_— - tsrc2%i < fsrc7(i) (unsigned) 
fdest <— last-stage graphics-unit result 
last-stage graphics-unit result(i) <— smaller of fsrc2(i) and fsret\i 


OD. 

MERGE: < 0. 
pst.d /dest, #constisrc2) aan peeeneetads: ieee nee eee maid eaten 3 eee ere Pixel Store 
pst.d fdest, #CONSHISICZ) + + occ cece eee cece ence teen e ete e teens ....Pixel Store Autoincrement 


Pixels enabled by PM in mem.d (isrc2 + #const) <— fdest 
- Shift PM_-right by 8/pixel size (in bytes) bits 

IF autoincrement 

THEN jsrc2 <— #const + isrc2 

Fl 


SCV OM ISIC2 ok ooo eae s bn eS oe eR are Oe Ven Oe ein ee Eas ea ee eee hee ake Special Cycles 
Generate a airs ae cycle (D/C# = 0, W/R# =1, M/IO# =0) and 
set BE7 # -BE0# according to the value contained in the register /src2_. 
NOTE: Not available with the i860 XR CPU 


SHI ISICT, ISIC2, DOSE o.oo. coc cc ccc cc ccccuccccucuccucuucuccuceccuveueucncencevencenencencs Shift Left 
- fdest <— isrc2 shifted left by /src7 bits 


shr isrct, ISICZ TAGS ii hl G ead ee haa emeee a eens haces ee eek se iaiin tial sade -Shift Right 
SC (in psr) <— isrc7 
idest <— isrc2 shifted right by isrot bits 


ents isrc?, isrc2, idest ee Pre re ee re he ete ree rere errr ee GiSasan Shift Right Arithmetic 
idest <— isrc2 arithmetically shifted right by /src7 bits | | 


shrd /src71ni, isrc2, idest......... 66. e eee. eee Side aee ace: Be Area pont Right Double 
idest <— low-order 32 bits of /src7nitisrc2 shifted right by SC bits © 


SUCISICI, CSIC! ins neta he CRG ARCE Wed peas Cee ELD OR Sates De OAS tens ears Store to Control Register 
esrc2 <— srctni 


SUM ISICTNAI, FCONSIISIOZ)). ihe siCOk i het oh ORG OS eR Oe Oh Sa Ee eee eee eee Store Integer 
mem.x (isrc2 + #const) <— isrctni 7 
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StiO.X ISICTMI, ISTC2 6.6 cc cee a hwdeat ear enews: athens Risto tiene liane ies acura Store I/O 
portx (isrc2) <— isrctni | 
NOTE: Not available with the i860 XR CPU 


Subs /src7, iSrc2, Id@St . 1... cee eens Lette ta citiaie Wie enGlae eRanene ey eee ee Subtract Signed 
idest <— isrc? — isrc2 
OF <— (bit 31 carry ¥ bit 30 carry) 
CC set if isrc2 > isrc7 (signed) . 
CC clear if isrc2 < isrc7 (signed) 


SUDWYSICT ISICZ IOEST fain es ie beh ho Ee Dek oth ok Naan ee BLO RER Spine mdse Subtract Unsigned 
idest <— isrc?t — isrc2 
OF < NOT (bit 31 carry) 
CC < bit 31 carry 
(i.e. CC set if isrc2 < isrct iuingigned) 
CC clear if isrc2 > isrc? (unsigned)) 


trap isrc1ni, isrc2, idest......... t Nena ncatns oe Gr pusdarn ia Ruki eataaaa ont ememiae ot Software Trap |é 
Generate trap with IT set in psr 


unlock ............ edict dedte tab neetx Peaioes coop deletes eas eae rere .End Interlocked Sequence 
. Clear BL in dirbase. The next load or store 
unlocks the bus. Interrupts are enabled. . 


Kor /Src7, ISIC2, IdCSt .. 66 cc es ee Ce Te ee ee ee Logical Exclusive OR 
idest <— isrct XOR isrc2 - 3 | 
CC set if result | is zero, cleared otherwise 


xorh #const isrc2, idest ......... Se ee ere Cweuecdainte Logical Exclusive OR High 
idest <— (#const shifted left 16 bits) XOR ssrc2 _ _ 
CC set if result is zero, cleared otherwise 


Table 10.2. FADDP MERGE Update 


Pixel Size Fields Loaded from | Right Shift Amount 
| (from PS) Result into MERGE (Field Size) 


63..56, 47..40, 31.24, 
63.58, __ 47..42, —— -831..26, 
63..56, —— - 31..24 
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10.2 Instruction Format and Encoding 


All instructions are 32 bits long and begin on a four- 
byte boundary. When operands are registers, the 
encodings shown in Table 10.3 are used. 


There are two general core-instruction formats 
(REG-format and CTRL-format) and a separate for- 
mat for floating-point instructions. 


Table 10.3. Register Encoding 


0 


Fault Instruction 
Processor Status 

Directory Base 

Data Breakpoint 
‘Floating-Point Status 
Extended Processor Status 


Bus Error Address” | 
Concurrency Control* 


SZ SOOOND}|HARWNM-AO 


NOTE: 
*Available only with i860 XP CPU. Using these encodings 
with the i860 XR CPU produces undefined results. 
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| “Bit 28 een Size 
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10.2.1 REG-FORMAT INSTRUCTIONS ‘a 


Within. the REG-format are several variations as 
shown in Figure 10.1. Table 10.4 gives the encod- 
ings for these instructions. One encoding is an es- 


cape code that defines yet another variation: the 


core escape instructions. Figure 10.2:shows the for- 
mat of this group, and Table 10.5 shows the encod- 


— ings. 


_In these instructions, the src2 field selects one of 


the 32 integer registers (most instructions) or one of 
the control registers (st.c and Id.c): Dest selects 
one of the 32 integer registers (most instructions) or 
floating-point registers (fld, fst, pfld, pst, ixfr). For 
instructions where src7 is optionally an immediate 
value, bit 26 of the opcode (I-bit) indicates whether 
src? is an immediate. If bit 26 is clear, an integer 
register is used; if bit 26 is set, src7 is contained in 
the low-order 16 bits, except for bte and btne 
instructions. For bte and btne, the five-bit immediate 
value is contained in the src7 field. For st, bte, btne, 
and bla, the upper five bits of the offset or broffset 
are contained in the dest field instead of src7, and 


_ the lower 11 bits of. offset are the lower. 11 bits of 


the neuncnen 


For Id and st, bits 28 and zero dstedaine eperane 
size as follows: 


8-bits 
8-bits 
1 6-bits 
32-bits 


When src7 is immediate and bit 28 is set, bit zero of 
_ the immediate value is forced to zero. 
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For fld, fst, pfld, pst, and flush, bit 0 selects autoin- 
crement addressing if set. For fld, fst, pfld, and pst, 
bits one and two select the operand size as follows: 


| pitt | Bit2 | Operand size 


64-bits 
128-bits 
32-bits 
32-bits 


For flush, bits one and two must be zero. 
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When src7 is immediate, bits zero and one of the 
immediate value are forced to zero to maintain align- 
ment. When bit one of the immediate value is clear, 
bit two is also forced to zero. 


For the instructions Idio, stio, Idint, and scyc, the 
operand size is encoded by bits 9 and 10 as follows. 
For other instructions, these bits are reserved and 
should be set to zero. 


Operand Size | Bit10 | 


8 Bits (.b) 
16 Bits (.s) 
32 Bits (.l) 
reserved 


J 50 29 28 27 26,25 24 235 22 2Nf2019 18 I7 (6415 14 15 12 WO 9 8 7 6G 5 4 5 2 f OF 


OPCODE/I SRC2 DEST 


IMMEDIATE, OFFSET, 
SRCI | OR NULL 
Mir on ae Nee Ne ee 


240874-74 


St 5O 29 28 27/26/25 24 23.22 21/20 19 18 17 1615 14 13 12 110 9 BFE 5 4 5 2 ft Of 


OPCODE SRC2 DEST 


IMMEDIATE | 


240874-75 


Jl 30 29 28 27 2625 24.235 2a 242019 18 17 16415 14 13 12 WlO 9 8 7 6 5 4 F 2 | OF 


OFFSET SRC1 
OPCODE/! SRC2 HIGH SRCIS 
Nae tN re rN 


OFFSET LOW 


" - 240874-76 


51 50 29 28 27/26/25 24 23 22 2442019 18 17 16/15 1413 12 NYO 9 8 76 5 4:5 2 f Of 


“SRC2 


OPCODE aa 


IMMEDIATE 


Figure 10.1. REG-Format Variations 


OFFSET LOW | 


240874-77 
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fid.x, fst.x 
flush 
pst.d 
Id.c, st.c | 


bri 
trap 


bte, btne 
pfid.y 


| addu, -s, subu, Ss 


|) shi, shr. 

| shrd 
bla 
shra 


and(h) 
andnot(h) 
or(h) 

xor(h) 


L Integer Length 
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- Table 10. 4. REG-Format ecoce "% 
31 30 


| (oad fnteder = | 


Store Integer 


‘ Integer to F- Reg Transfer 7 
(reserved): 


Load/Store F-P ) 

Flush 

Pixel Store 
Load/Store Control Register 


Branch Indirect 

Trap | 

(Escape for F- p Unit) 
(Escape for Core Unit) . 


Branch Equal or Not Equal 
_ PipelinedF-PLoad 
-(CTRL-Format Instructions) _ 


Add/Subtract 


~ Logical Shift 


Double Shift 


» -Branch LCC Set and Add 


Arithmetic Shift 


“AND. 


ANDNOT 
OR 


~XOR 


(reserved) 


| -—8§ bits 
1 —16 or 32 bits (selected by bit 0) 
LS  Load/Store 
0 —Load 
| 1 —Store | 
SO Signed/Ordinal 
0 —Ordinal 
1 —Signed 
H High . 
0 —and, or, andnot, xor 
1 —andh, orh, andnoth, xorh 


- AS 


LR 


0 0 0 L 
o | o |} 0 i 
0. 0 6) Oo. 
0 0 om 1 
0 0 1 Oo |. 
0 0 1 1 0 
0 ‘0 a ome ee | 
0 8) 1 hls LS 
0 - 0 0 0 
0 1 0 0 0 
0 1 0 0 1 
0 1 0 0 1 
0 4 0 42.2) JE 
0 1 1 0 0 
0 Seed eee Se ee x 
1 0 0 | .SO. AS 
~4 ~ 0 ol 0 LR 
1 QO. 1 1 0 
ae bee 0 ; 4 1 0 
ol 0 ee 1° 1 
t-{. 1 0 _ O oH 
1 1 0 1 H 
oe 4 bea Snr oO; H 
on i 1 “oe 41 H 
1 el ie one ee de 1 
Add/Subtract ° 
0. —Add . 
4 —Subtract: — 
Left/Right ~ 
0° —Left Shift 
1. —Right Shift . 
Equal — . 
QO © —Branch on Unequal 
1 —Branch on Equal 
Immediate 
0 —src?7 is register | 
1 —=src7 is immediate 


RESERVED BY INTEL CORPORATION (SET TO ZERO) 


240874-78 


Figure 10.2. Core Escape Instructions | 
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Table 10.5. Core Escape Opcodes 
; SS 


— (reserved) , 

lock Begin Interloacked Sequence 
calli Indirect Subroutine Call 

— (reserved) 

introvr Trap on Integer Overflow 

— (reserved) 

— (reserved) 


unlock End Interlocked Sequence 
Idio* Load I/O 
stio” Store |/O 
Idint* Load Interrupt Vector 
scyc* Special Cycles 

(reserved) 

(reserved) 

(reserved) 


AOs24244uccdCDOOOOC|H 
xx -4ODO00=]=3 34 0CCCOO|N 
Benen OS SoS o/s 
xxx 4AOtOA0-0CA-A0-0/0 


—--—-§ OO0d0 O0dC0O 0000 00 


NOTE: 
*Available only with i860 XP CPU, not with i860 XR CPU 


10.2.2 CTRL-FORMAT INSTRUCTIONS 


The CTRL-Format instructions do not refer to registers; so, instead of the register fields, they have a 26-bit 
relative branch offset. Figure 10.3 shows the format of these instructions and Table 10.6 defines the encod- 
ings. a 


ae ae 


{31 30 2/28 27 26/25 24.23 22 202019 18 WIGIS WIS I2nNtI IB 7 ES 452 1 i 


240874-79 


NOTE: . . 
BROFFSET is a signed 26-bit relative branch offset 


Figure 10.3. CTRL-Format Instructions 


Table 10.6. CTRL-Format Opcodes 
a 28 


(reserved) 
(reserved) | 
Branch Direct | 
Call 
Branch on CC Set 
Branch on CC Clear 
T Taken 
0 —bce or bne 
1 —be.t or bne.t | 
10.2.3 FLOATING-POINT INSTRUCTION , the floating-point instructions, and Table 10.7 gives | 
ENCODING the encodings. Within the dual-operation instructions 
| _ is a subcode DPC whose values are given in Table 
The floating-point instructions also constitute an es- 10.9 along with the mnemonic that corresponds to 


cape series. All these instructions begin with the bit |. each. 
sequence 010010. Figure 10.4 shows the format of 
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31 30 29 28 27 26/25 24.23 22 242019 18 17 16f15 1413 12 NOL 9{8/7/6 5 4 5 2 1 Of 


010010] sRc2 DEST |} — SRCI }P]o]s}e OPCODE 
SS, Ce ae, Se Ge, Sh ee ee 


SRC1,SRC2. Source; one of 32 floating-point registers 
DEST Destination; one of 32 floating-point registers (except fxfr; one of 32 integer registers) 


RRO. 


240874-80 


P Pipelining 
1 Pipelined instruction mode 
0 Scalar instruction mode | 
D Dual-Instruction Mode . 
1 Dual-instruction mode 
0 _—_ Single-instruction mode 


Source Precision | | 
, 1. Double-precision source operands 

0 Single-precision source operands 
R ~—_—~ Result Precision 

1 Double-precision result | 

0 Single-precision result 


Figure 10.4. Floating-Point Instruction Encoding | 


Table 10.7. Floating-Point Opcodes’~—_- | ee, 
| 6 5 3 2 1 0 
Add and Multiply* — | oe 
Multiply with Add* — 
Subtract and Multiply* 
~ Multiply with Subtract* 


(p)fmul Multiply ~ 
fmliow Multiply Low 
frcp Reciprocal 5 WP Re 
frsqr Reciprocal Square Root 
pfmul3.dd 3-Stage Pipelined Multiply 


(p)fadd Add 

— (p)fsub Subtract 
(p)fix Fix 
(p)famov Adder Move 
pfgt/pfle** Greater Than — 
pfeq Equal 

— (p)ftrunc Truncate 


fxfr | Transfer to Integer Register 
(p)fiadd Long-Integer Add 
(p)fisub Long-Integer Subtract 


(p)fzchkl Z-Check Long 

-(p)fzchks | Z-Check Short 

(p)faddp Add with Pixel Merge 
(p)faddz Add with Z Merge 
(p)form OR with MERGE Register 


o-2--[--elesere-eesere 


0. 

0 

0. 

0 

1 
- 

aA 

1 

1 

1. 
] 

0 

0 

0 

1 

1 

1 

1 

1 


aOOosnocolasaolocooc0c0c0!oo000°o 
COOmAmM|Aoolonasxoo00]|~oo00d 


NOTE: | 7_ 
_All opcodes not shown are reserved. | | | 

* pfam and pfsm have P-bit set; pfmam and pfmsm_ have P-bit clear. 
** pigt has R bit cleared; pfie has R bitset. - - 


2-134 


intel. i860™ XP MICROPROCESSOR PRELIMINARY 


DPC PFAM PFSM T K 
Mnemonic Mnemonic Load Load* | 
No No 


Table 10.8. DPC Encoding 


KR 


srct M result 


KR M result’ No Yes 
KR A result Yes | No 
A result Yes Yes 


No 


No 


M result | 
No Yes 


M result 
A result Yes No 
A result Yes Yes 


ratip2 rat1s2 A result src src2 Yes No 
mi2apm mi2asm src src2 A result M result No No 
raip2 . ra2s2 KR A result src src2 No ~No- 
m12ttpa m12ttsa src src2 T A result Yes No 


iatip2 iatis2 Kl A result src src2 Yes No 
m12tpm m12tsm src src2 T M result No No 
iaip2 ia1s2 KI A result src src2 No No 
m12tpa mi2tsa src src2 T A result 


No N 
PFMAM PFMSM M-Unit M-Unit A-Unit A-Unit T K 
Mnemonic Mnemonic | opt op2 op1 op2 Load Load* 
No N 


O 
0 


mr2s1 ‘KR src M result 

mr2st T M result No Yes 
mr2ms1 src M result Yes No 
mr2mst_ T M result Yes Yes 


mi2s1 src M result No No 
mi2st T M result No Yes 
mi2ms1 sre M result Yes No 
mi2mst ue M result Yes. Yes 


mrmtip2 mrmtis2 M result — src2 No 
mmi2mpm mmi2msm src2 M result M result No 
mrmi1p2 mrmis2 M result src src2 No 
mm12ttpm mm12ttsm M result No 


mimt1p2 -mimtis2 | M result Yes — No 
mm12tpm mm12tsm src2 No No 
mim1ip2 mim1s2 M result - ~No No 


Intel Reserved 


NOTE: ; 
* If K-load is set, KR is loaded when operand-1 of the multiplier is KR; KI is loaded when operand-1 of the multiplier is Kl. 
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10.3 Instruction Timings 


Generally, i860 XP microprocessor instructions take — 
one clock to execute unless a freeze condition is: 


invoked. Detailed times, along with freeze conditions 
and their associated delays, are shown in the table 


on the following pages. The following symbols are 


used for brevity in the timing table: 


+n n clocks must be added to the execution 
time if the stated conditions apply. 


<— n The processor requires at least 7 clocks be- 
tween the indicated instructions. The actual 
delay will be.” minus the number of clocks 
for executing intervening instructions (or 


. dual-mode pairs). If the time for intervening | 


instructions is = n, there is no delay. 


m.m —- Indicates a range of clocks. These cases 


are accompanied by a reference to a note 
where further explanation is available. 


XR: Applies to i860 XR microprocessors only. 
XP: — Applies to i860 XP microprocessors only. 


OA ~The number of clocks to finish all outstand- - 


ing accesses. 


R11 | The number of clocks from ADS# through’ 


the first READY # (80860XR) or BRDY # 
(80860XP) of the indicated bus activity. 


~R2 The number of clocks from ADS # through 
the second READY # or BRDY #. 


RL ‘The number of clocks from ADS# through 
the last READY # or BRDY#. 


RL1 XP: The number of clocks through last 
BRDY # of first access. 


RN XR: The number of clocks until next nonre- 
peated address can be issued (i.e., an.ad- 
dress that is not the 2nd—4th cycle of a 
cache fill, the 2nd—8th cycle of a CS8 mode 
instruction fetch, nor the 2nd cycle of a 128- 

| bit write). 

RX The number of clocks through READY # or 
BRDY # for the next 64-bit-or-less write cy- 
cle or second READY # or BRDY# for ne 
next 128-bit write cycle. 


NOTES: 


a. “Address path full” means one address inter- 
nally waiting for bus while external bus pipeline 
full. 
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b. . “Store path full’’ means two stores or one 256- 
bit write-back internally waiting for bus ei ex- 
ternal bus pipeline full. 


c. § lifa floating-point instruction, cag in- 
struction, fst, or pst is executed when a scalar 
floating-point operation (other than frep or 

: frsqr) is in progress, the scalar operation must. 
-complete first: two additional clocks for fadd, 
fix, fmlow, fmul.ss, fmul.sd, ftrunc, and 
fsub; three additional clocks for. fmul.dd. Add 
one if either or both of these situations occur: 


1. There is an overlap between the result reg- 
ister of the previous scalar operation and 
the source of the floating-point operation, 
and the destination precision of the scalar 
operation differs from the source precision 
of the floating-point operation. 


2. The floating-point operation is pipelined 
and its destination is not f0. 


TLB TLB miss. Five clocks plus the number of 
clocks to finish two reads plus the number of 
clocks to set A-bits (if necessary). 


In addition, any instruction may be delayed due to an 
instruction cache. miss or TLB miss during the in- 
struction fetch. The time for a TLB miss is shown 
above in note TLB. An instruction cache miss adds 
the following delays: | 


e The number of clocks to get the next instruction 
from the bus (ADS# clock to first ee, or 
BRDY # clock, inclusive). 


e XR: When any of the instructions in the new in- 


struction-cache line is a branch or call or causes 
a freeze, the time through the last READY # for 
the new line. 


e lf the data cache is being accessed when the in- 
struction-cache miss occurs, two clocks for data 
cache miss; one clock for hit. 


~ Not included in the table is the delay caused by a 


trap. This depends on the trap handler. 


In dual instruction mode, each pair of instructions . 


requires the maximum of the times required by each 
individual instruction. 
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Instruction Execution | Condition 
Clocks 


adds 


addu 

and 

andh 

andnot 

andnoth | 

be | | ~ If branch not taken. 


lf branch taken. 
lf the prior instruction is addu, adds, subu, subs, pfeq, or pfat. 


If branch taken. 
If branch not taken. | 
lf the prior instruction is addu, adds, subu, subs, pfeq, or pfgt. 


lf branch taken. _ 
If branch not taken. 


(same as bc) 


(same as bc.t) 


If branch not taken. 
If branch taken. 


(same as bte) 


If r1 referenced in next instruction. | 
lf data cache load miss in progress for a read of less than 128 bits. 
lf data cache load miss in progress for 128-bit read. 


If r1 referenced in next instruction. 
If data cache load miss in progress fora read of less than 128 ls 
If data cache load miss in progress for 128-bit read. 


(...and all other A-unit instructions except dual operations) _ 
If executed when a scalar floating- “point operation (other than frop_ 
or frsqr) is in progress. (c) 
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Execution 


instruction Clocks | | _ Condition 


faddp 4 (...and all other G-unit instructions except fiadd.w, fxfr) 
+1 If fdestis used by next instruction and next instruction is G-, M- or A-unit Instruction 
<—> 2..4 If executed when a scalar floating-point operation (other than frep or frsqr) i is in 
pivgieey: “ | 


faddz (same as faddp) 
famov.r (same as fadd.p) 


fiadd.w 1 
+1 If fdestis used by next instruction and next instruction is M- or A-unit instruction 
(except when fiadd is used for fmov.dd or fmov.ss). 
+1 If fdestis used by next instruction and next instruction is G-unit instruction. 
<—> 2..4 If executed when a scalar va point operation (other than frep or frsqpr) is in 
_ progress. (c) 7 


fisub.w (same as faddp) 
fix.v 7 | _ (same as fadd.p) 
-fid.y 1 


+1 If this is the instruction after a st, fst or pst that hits the data cache. 
<— 2 If fdestis referenced in the next two instructions. 
+1+R1 If 32-bit fid.I or 64-bit fid.d misses the data cache. 
+1+R2 If 128-bit fid.q misses the data cache. | 
+1+RL If data cache load miss in progress (except in the following case). 
<— 2 XP: If this instruction follows a data cache access that misses in the virtual tags but 
| hits in the physical tags. 
+2 XP: If the prior instruction is a pfld.y that hits a modified line i in the data cache. 
+R2 XP: If data-cache line write-back due to snoop is in progress. 
+RN XR: If address path full.(a) 
+RL1 XP: If address path full.(@) _ 
+TLB If TLB miss. 


flush 1 
<—> 3 XR: If preceded by another flush. 
<— 2 XP: If preceded by another flush. 
+R2 XP: If data-cache line write-back due to snoop is in progress. 
+1+RX If flush to modified line when store path full. (b) 
+TLB If TLB miss. 


fmlow.dd 1 (...and all other M-unit instruction except dual operations) 
+1 If fsrc7 refers to result of the prior operation (either scalar or pipelined). 
+1 Ifthe prior operation is a double- -precision multiply. 
<—> 2..4 If executed when a scalar floating- point operation (other than frep or frsqr) is in 
: progress.(c) 


fmov.r a | be cums ge enn 
| fmov.ss and fmov.dd same as fiadd.w — 
{mOVv.sa and fmov.ds same as fadd.p » 


fmul.p | (same as fmlow.dd) 
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instruction execution Condition | 
Clocks ) | 


fnop 1 

form (same as faddp) 
frcp.p _ (same as fmlow.dd) 
frsar.p (same as fmlow.dd) 


fst.y 1 : 
| +1 If followed by pipelined floating-point operation that overwrites the register 
being stored. 
+1+RL _ If data cache load miss in progress. 
+2 XP: If the prior instruction is a pfld.y that hits a modified line in the data cache. 
<— 2 _ XP: If this instruction follows a data cache access that misses in the virtual 
tags but hits in the physical tags. 
+R2 _ XP: If data-cache line write-back due to snoop is in progress. 
<—> 2..4 lf executed when a scalar floating-point operation (other than frep or frsqr) is 
in progress. (Cc) 
+RN_ XR: If address path full.(a) 
+RL1 XP: If address path full.() 
+1+RX — Ifcache miss when store path full.(5) 
+TLB If TLB miss. | 


fsub.p (same as fadd.p) 


ftrunc.v (same as fadd.p) 


fxfr 1 

+1 If idestreferenced in next instruction. 

+1+R1 If data cache load miss in progress for 64-bit read. 

+1+R2 If data cache load miss in progress for 128-bit read. 

<—> 2..4 If executed when a scalar floating-point operation (other than frep or frsqr) is 
in progress. (c) | 


fzchki (same as faddp) 
| tzchks (same as faddp) 
| Intovr | | | 
ixfr 
If data cache load miss in progress for 64-bit read. 


lf data cache load miss in progress for 128-bit read. 
lf (dest is referenced in the next two instructions. 


\f idest referenced in next instruction. 
lf data cache load miss in progress for 64-bit read. . 
lf data cache load miss in progress for 128-bit read. 
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Condition 


Id.x 


Idint.x 

| Idio.x 

| lock 

mov 

nop 

or 

| orh 
pfadd.p 

| pfaddp — 

| pfaddz 


| pfam.p 


| pfamov.r 
| pfeq.p 
Pfgt.p 


| pfiadd.w_ 


_ Clocks 


1 


1 


If idest referenced in next instruction. 

If this is the instruction after a st, fst or pst that hits the data cache. 

If data cache load miss in progress. 

If Id.x misses the data cache and a subsequent instruction references the. 
idest of the Id.x (except for following case). | 

XP: If this instruction follows a data cache access that misses in the virtual 
tags but hits in the physical tags. 

XP: If the prior instruction is a pfld.y that hits a modified line i in the data cache. 
XP: If data-cache line write-back due.to eiQeP is in progress. 

XR: If address path full.(@) 

XP: If address path full.(@) — 


If cache miss when store: oe full.(®) 


lf TLB miss. 


- + OA 


+ OA 


(same as fadd.p) 


‘(same as faddp) 


(same as faddp) 


+1 
+1 
<— 2..4 


(...and all other dual operations). 

lf fsrc7 refers to result of the prior operation (either scalar or pipelined). 

If the prior operation is a double-precision multiply. 

If executed when a scalar floating-point operation (other than frep or frsqr) is 
in progress. (c) 


(same as fadd.p) 


(same as fadd.p) 


(same as fadd.p) 


(same asfaddp) =~ 


(same as faddp) 


(same as fadd.p) — 
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: Execution a 
Instruction Clocks | Condition 
pfid.y 1 
+1+RL If data cache load miss in progress. 
<— 2 If dest is referenced in the next two instructions. 
~+1+RL1 If three pfld’s are outstanding. 
+2+O0OA XR: If pfid hits data cache. 
+2 _ XP: If the prior instruction is a pfld.y that hits a modified line in the 
data cache. 
<— 2 XP: If this instruction follows a data cache access that misses in 
the virtual tags but hits in the physical tags. 
+R2 XP: If data-cache line write-back due to snoop is in progress. 
+RN- XR: If address path full.(@) 
| +RL1 XP: If address path full.(a) 
\ +TLB lf TLB miss. 
pfle.p 1 
pfmam.p (same as pfam.p) 
pfmov.r | 
pfmov.ss and pfmov.dd same as faddp 
pfmov.sd and pfmov.ds same as fadd.p 
pfmsm.p (same as pfam.dd) 
pfmul.p (same as fmlow.dd) 
pfmul3.dd (same as fmlow.dd) 
pform (same as faddp) 
pfsm.p _ | ; (same as pfam.dd) 
pfsub.p | (same as fadd.p) 
pftrunc.v (same as fadd.p) 
pfzchkl (same as faddp) 
pfzchks (same as faddp) 
pst.d (same as fst.d). 
SCYC.x 1+ OA 
shi 1 
shr 1 
shra 1 
shrd — | 1 
ste ; 3 | _ ata 
? +1+R1 If data cache load miss in progress for a read of less than 128 bits. 
+1+R2 lf data cache load miss in progress for 128-bit read. 
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Condition 


lf data cache load miss in progress. 

XP: If the prior instruction is a pfld.y that hits a modified line in the data cache. 
XP: If this instruction follows a data cache access that misses in the virtual 
tags but hits in the physical tags. 

XP: If data-cache line write-back.due to snoop is in progress. 


XR: If address path full.(a) 
XP: If address path full.(a) 


If cache miss when store path full. (b) 


' If TLB miss. 
stio.x 
subs 
subu 
trap 
| unlock 
xor 


xorh 


10.4 Instruction Characteristics 


The following table lists some of the characterisics 


of each instruction. The characteristics are: 


e What processing unit executes the instruction. 
The codes for processing units are: 


A_ Floating-point adder unit 
E Core execution unit 
G Graphics unit 
M Floating-point multiplier unit 
_© Whether the instruction is pipelined or not. A P 
indicates that the instruction is pipelined. - 
e@ Whether the instruction is a delayed branch in- 
| struction. A D marks the delayed branches. 
-@ Whether execution is suppressed in user mode. 
An SU marks supervisor-only instructions. 


e Whether the instruction is available on both the 
i860 XR and i860 XP microprocessors. An XL 
marks instructions that are available only on the 
i860 XP microprocessor. 


(@ Whether the instruction changes the condition 
code CC. A CC marks those instructions that 
change CC. 


e Which faults can be caused by the instruction. 
The codes used for exceptions are: 


IT — Instruction Fault | 
SE _— Floating-Point Source Exception 
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RE Floating-Point Result Exception, including 


overflow, underflow, inexact result 
DAT Data Access Fault 


Note that this is not the same as specifying at which 
instructions faults may be reported. A result excep- 
tion is reported on the subsequent floating-point in- 
struction, pst, fst, or sometimes fid, pfld, and ixfr. 


The instruction access fault IAT and the interrupt 
trap IN are not shown in the table because they can 


_ occur for any instruction. 


© Performance notes. These comments regarding 
optimum performance are recommendations only. 
If these recommendations are not followed, the 
i860 XP microprocessor automatically waits the 

- necessary number of clocks to satisfy internal 
hardware requirements. The following notes de- 
fine the numeric codes that appear in the instruc- 
tion table: 


1. The following instruction should not be a condi- 
tional branch (be, bne, be.t, or bne.t). 


2. The destination should not be a source oper- 
. and of the next two instructions. 


3. A load should not directly follow a store that is 
expected to hit in the data cache. 


4. When the prior instruction is scalar, fsrc7 
should not be the same as the /dest of the prior 
operation. 
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5. The fdest should not reference the destination 
of the next instruction if that instruction is a 
pipelined floating-point operation. 


6. The destination should not be a source oper- 
and of the next instruction. (For call and calli, 
the destination is r1.) 


7. When the prior operation is scalar and multipli- 
er op7 is fsrc7, fsrc2 should not be the same as 
the fdest of the prior operation. 


8. When the prior operation is scalar, src? and 
src2 of the current operation should not be the 
same as dest of the prior operation. 


9. A pfld should not immediately follow a pfid. 


Programming restrictions. These indicate combi- 
nations of conditions that must be avoided by pro- 
-grammers, assemblers, and compilers. The fol- 
lowing notes define the alphabetic codes that 
appear in the instruction table: 


a. The sequential instruction following a delayed 
control-transfer instruction may not be another 
control-transfer instruction, nor a trap instruc- 
tion, nor the target of a control-transfer instruc- 
tion. 


Pipelined? 
Delayed? 
Supervisor? 
i860T™ XP Only? 

adds 
addu 
and 
andh 
andnot 


andnoth 


coo sf 


E 
E 
E 
E 
E 
E- 
E 
E 
E 
E 
E 
E 
A 
G 
G 
A 
G 
G 
A 
“le 


NOTES: 


. When using a bri to return from a trap handler, 


programmers should take care to prevent traps 
from occurring on that or on the next sequen- 
tial instruction. IM should be zero (interrupts 
disabled) when the bri is executed. 


. If fdest is not zero, fsrc7 must not be the same 


as fdest. 


. When fsrc7 goes to multiplier 097 or to KR or 


Kl, fsrc7 must not be the same as fdest. 


. If dest is not zero, src? and src2 must not be 


the same as dest. 


. /src1 must not be the same register as /src2 for 


the autoincrementing form of this instruction. 


. /src1 must not be the same register as /src2. 
. flush must not be used in a locked sequence 


or in dual instruction mode. 


Performance Programming 
Notes ‘Restrictions 


* On the i860 XP microprocessor, the pipelined instructions can generate ITR with Pl. 
** On the i860 XR micropocessor, the 128-bit pfid.q is not available. If used it causes an instruction trap. 
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Pipelined? | | 
Delayed? Performance | Programming 
Supervisor? Notes Restrictions 
i860TM XP Only? | ae | 
flush 
fmiow.dd 
fmul.p 
form 
_ frep.p 


frsqr.p 
fst.y 
fsub.p 
ftrunc.p 
txfr 


fzchkl 
fzchks | 
Intovr 
ixfr 

Id.c 


Id.x 
Idint.x 
Idio.x 
lock | 
or 
orh 


pfadd.p 
pfaddp 

| pfaddz 

‘| pfam.p 
pfamov.r 


pfeq.p 
Pfgt.p | 
pfiadd.w 
pfisub.w 
pfix.p 
pfid.y 
pfmam.p 
pfmsm.p 
pfmul.p — 
pfmul3.dd 


pform 
pfsm.p 
pfsub.p 
pftrunc.p 
pfzchkl 


E 
M 
— oM 
G 
M 
M 
Ee 
LN 
A 
G 
G 
G 
E 
E 
E 
E 
E 
E 
—_ 
E 
A 
G 
G 
A& 


< 
VuvuViv<~ VV 


| P,(XP)** 


> 
QOrreKeg|sstReemli rgonrr!> 
<< << | | 


VvVv00T0U;UU UU 


NOTES: | | | 
* On the i860 XP microprocessor, the pipelined instructions can generate ITR with PI. 
** On the i860 XR micropocessor, the 128-bit pfld.q is not available. If used it causes an instruction trap. 
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pfzchks 
pst.d 
SCYC.X 
shl 

shr 


subs 
subu 
trap 
unlock 
xor 
xorh 


G 
E 
E 
E 
E 
E 

'£ 
E 
E 
E 
E 
E 
E 
E 
E 
E 


NOTES: : 
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Performance Programming 


Restrictions 


*On the i860 XP microprocessor, the pipelined instructions can generate ITR with Pl. 
**On the i860 XR micropocessor, the 128-bit pfld.q is not available. If used it causes an instruction trap. 


10.5 Software Compatibility 
10.5.1 REQUIRED CHANGES 


To port existing systems software from the i860 XR 
microprocessor to the i860 XP microprocessor, the 
following changes may be required. Applications 
software does not require changes. 


1. Data cache flush. All four ways of the data cache 
must be flushed on the i860 XP microprocessor. 
The cache flush routine can be modified to check 
processor type in epsr or the DCS field of 
dirbase and flush the appropriate number of 
ways. — | , 


2. Parity and bus error traps. If the i860 XP system 

~ signals these errors, the trap handler must be ex- 

tended to handle them. Software must avoid test- 

~ ing the BEF and PEF bits unless executing on the 
i860 XP microprocessor. . 


3. LOCK# deactivation. On the i860 XP microproc- 
essor, traps do not automatically deactivate the 
LOCK# signal, so the trap handler must do a 


data access to deactivate LOCK #. Trap handlers ~ 


that already access data soon after invocation do 
not require this modification. 


4.. Load pipe precision. The precision of the last 
stage of the load pipeline is specified by the LRP 
bit on the i860 XR microprocessor but by the 
LRPO and LRP1 bits on the i860 XP microproces- 


sor. The procedure that restores the load pipe 
must check the processor type, use the appropri- 
ate bits, and restore the correct precision. Pipe 
restoration code for the i860 XR microprocessor 
will work correctly on the i860 XP microprocessor 
if pfld.q is not used. - 


5. Pre-accessed trap handler pages. Page-directory 
‘and page-table entries for the instruction pages 
of the trap handler and for the first data page 
accessed by the trap handler must always have 
A = 1. Software modified to allocate page tables 
this way works on both i860 XR and i860 XP mi- 
croprocessors. 3 


6. Page directory entry bit 7 must be zero. This is 


-the bit that selects four Mbyte or four Kbyte page 
size. On the i860 XR microprocessor, it is re- 
served and should be set to zero. It must be set 

~ to zero for four Kbyte pages to work. on the 
i860 XP microprocessor. a - 2 


10.5.2 PERFORMANCE OPTIMIZATIONS 


Software developers may wish to make the following 
performance enhancements in systems software for 
the i860 XP microprocessor. Systems software that 
must execute on both i860 XP and i860 XR systems 


~ can contain code both with and without the optimiza- 


tions. By testing the processor type, the appropriate 
instruction path can be determined. 


2-145 


inte. 


1. Data cache flush. On the i860 XP microproces- 
sor, a complete flushing of the data cache is not 


needed when changing context or marking a 


page not present. 


2. The epsr bits Al, Dl, Pl, and PT can be used on 
the i860 XP microprocessor to make trap han- 
dlers more efficient. 


_ 3. Four-Mbyte pages can be allocated to frame buft- 
ers and the operating-system kernel, thereby re- 
ducing the cost of TLB misses. 


10.5.3 NEW FEATURES 


Software that uses the new features available only 
on the i860 XP microprocessor will not be compati- 
ble with the i860 XR microprocessor unless alter- 
nate instruction paths are proviges 


Systems software features: 


. New instructions Idio, stio, Idint, and scyc. 
. Four-Mbyte pages. 7 

Privileged Registers pO, pt, p2, and p3. 

. Concurrency control unit. 

. 128-bit load instruction pfid.q.. 

. Support for es address aliases. 


Oath wh — 


Applications software features: 


1. Concurrency control unit. 


2. 128-bit load instruction pfld.q. The i860 XR mi- 
-croprocessor traps on pfid.q; therefore, software 
‘has the opportunity to emulate a pfid.q with two 

- pfid.d instructions. However, this strategy does 

~ not yield ae Pv enmanee on the i860 XR mi- 

croprocessor. » : 


10.5.4 NOTES | 


On the i860 XP microprocessor: pages with WT = | | 


are cached with the write-through policy; whereas, 
on the.i860 XR microprocessor, they are not cached 
at all. Because this change in the function. of WT 
was anticipated in the i860 XR microprocessor docu- 
mentation, no incompatibility should arise. | 
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11.0 REVISION HISTORY 


DATA SHEET REVISION. REVIEW 


The following list represents the major differences 
between version 002 and version 001 of the i860 XP 


Microprocessor Data Sheet. 


Section 2.2.4 Al bit has been snanged to TAI in 
Figure 2.5. The explanation for Pl 

ee __ bit has been expanded. 

Section 4.2.33 PCHK# signal description ae 
been expanded. 

Section 4.2.35 Output buffer configuration has 
been added in PEN# signal de- 

oe scription. 

Section 4.2.37. RESET description has been ex- 

| panded. | 

Section 5.1.3 Table 5.2 has been eordetad: 


The explanation of write/read and 
read/write pipelining has been re- 
vised. 


Section 5.2.2.4-5 The explanation of late back-off - 


mode has been expanded. | 
Figure 5.27 has been corrected. 


The explanation of EWBE# tim- 
__ing has been corrected. -_ 
RESET initialization description 
has been expanded. 7 
D.C. Characteristics are correct- 
ed. oye pe PS Meas 
_ A.C. Characteristics are replaced 
with nominal Aimings paced on 
Ci =.0pF.°° 
Figure 9.3 and Figure 9.4 have 
__ been replaced with nominal A.C. 
timings based on C_ = 0 pF. 


_ Figure 9. 5 has been corrected for 
normal and high-current output 
buffers. 


Component buffer model has 
_been added. 


_ Programming rasinetionie on flush 
instruction has been added. | 


Section 5.3.4 
Section 5.5 


Section 9.2 


Section9.4. 


Section 10.4 
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7 : ; AA 
| | fsr U-bit (update bit), 2.2.8 

8-bit pixel pmpeste 

data type, 2.1.4 : access rights | 

pest address translation caches, 3.1 

16-bit pixel 

data type, 2.1.4 | A.C. characteristics 

electrical data, 9.3 

16-bit values | F 

alignment requirements, 2.3 addressing 

i860 XP microprocessor, 2.3 
32-bit binary floating-point 
i a modes, 2.7 


single-precision real, 2.1.3 
address space 


32-bit integer 
consistency, 3.3.1 


data type, 2.1.1 | 
address translation 


ef eat 019 algenthin, 2.4.5 
caches, 3.1 
32-bit pixel | . 2 _ faults, 2.4.6 
data type, 2.1.4 | f P (present) bit, 2.4.4.2 
apie siuee virtual addressing, ae 
alignment requirements, 2.3 a adds (Add Signed) 


epsr OF (overflow flag), 2.2.4 
instruction definition, 10.1 
instruction timing, 10.3 


64-bit binary floating-point 
double-precision real, 2.1:3 
floating-point register file, 2.2.2 | | 

addu (Add Unsigned) | 
epsr OF (overflow flag), 2.2.4 
instruction definition, 10.1 
instruction timing, 10.3. 


64-bit integer 
data type, 2.1.1 
floating-point register file, 2.2.2 


64-bit values 


; | ADS # (address status) 
alignment requirements, 2.3 — 


AHOLD (address hold), 4.2.3 
128-bit load and store instructions _. signal description, 4.2.2 


floating-point register file, 2.2.2 
g-p g AE 


128-bit values . a . fsr U-bit (update bit), 2.2.8 


li nt requirements, 2.3 | 
MIGAMEN: LeQUIFeMe Sire AHOLD (address hold) 


82495XP/82490XP cache | bus arbitration, 5.2 — 

BRDY # (burst ready), 4.2.7 | signal description, 4.2.3 

external secondary cache, 1.0 _ | a 

write-once policy, 3.2.4.2 | gone 

| address translation, 2.4.5 

A31-A3 (address pins) » ah | cache replacement, 3.2.3 

signal description, 4.2.1 ee 

| | aliasing 

A (accessed) | — 3 instruction cache, 3.2.2 

page-table entries (PTEs), 2.4.4.6 | internal instruction and data caches, 3.2 
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alignment 
requirements, 2.3 


andh (Logical AND High). 
instruction definition, 10.1 
instruction timing, 10.3 


and (Logical AND) 
instruction definition, 10.1 
instruction timing, 10.3 


andnoth (Logical AND NOT High) 
‘instruction definition, 10.1 
instruction timing, 10.3 


andnot (Logical AND NOT) 
instruction definition, 10.1 
instruction timing, 10.3 


ANSI/IEEE Standard, 754 to 1985, 1.0 


AO P 
fsr U-bit (update bit), 2.2.8 


arbitration . 
bus operation, 5.2 
HOLD and HLDA, 5.2.1 


-_ ATE (address translation enable) _ 
address translation, 2.4 _ | 
dirbase format description, 2.2.6 


AU | 
fsr U-bit (update bit), 2.2.8 — 


B ; 

back-off 
bus cycle, 5.2.2 
late modes, 5.2.2.3 — 
one-clock late mode, 5.2.2.4 
two-clock late mode, 5.2.2.5 


_be (Branch on CC) 
instruction definition, 10.1 
instruction timing, 10.3. 


be.t (Branch on CC, Taken) 
instruction definition, 10.1 
instruction timing, 10.3 - 
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BE7 #-—BE0O# (byte enables) 
signal description, 4.2.4 | 


bear (bus error address register) 
format description, 2.2.10 


BE (big endian) 
data cache, 3.2.1 
_ epsr format description, 2.2.4 


BEF (bus error flag) 7 
epsr format description, 2.2.4 


BEn# 
BE7#-—BE0O# (byte enables), 4.2.4 


BERR (bus error) 
bear (bus error address register), 2.2.10 
bus error trap, 2.8.7 
epsr BEF (bus error flag), 2.2.4 
psr IM (interupt mode), 2.2.3 
signal description, 4.2.5 » 


big endian mode 
addressing, 2.3 


bla (Branch on LCC and Add). . 


epsr Al (trap on autoincrement instruction), 2.2.4 
instruction definition, 10.1 
instruction timing, 10.3 


BL (bus lock) | 
dirbase format description, 2.2.6 » 


bne (Branch on Not CC) 
instruction definition, 10.1 _ 
_ instruction timing, 10.3 


bne.t (Branch on Not CC, Taken) 
instruction definition, 10.1 
instruction timing, 10.3 — 


BOFF # (back-off) 
ADS # (address status), 4.2.2 
BERR (bus error), 4.2.5 - 
bus arbitration, 5.2 | 
dirbase LB (late back-off mode), 2.2.6 — 
FLINE# choice, 5.3.5.1 
signal description, 4.2.6 
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boundary scan 
register cell ordering, 6.5 


BPR (bypass register) 
test, 6.2 


br (Branch Direct Unconditionally) 
instruction definition, 10.1_ 
instruction timing, 10.3 


BR (break read) 
debugging i860 XP microprocessor, 2.9 
psr format description, 2.2.3 


BRDY # (burst ready) 
bear (bus error address register), 2.2.10 
BERR (bus error), 4.2.5 
epsr IL (interlock), 2.2.4 — 
locked access, 3.2.4.3 
signal description, 4.2.7 
write-once policy, 3.2.4.2 


BREQ (bus request) 
signal description, 4.2.8 


bri (Branch Indirect Unconditionally) 
instruction definition, 10.1 


bri (Branch Indirect Unconditionally) 
instruction timing, 10.3 


BS (bus or parity error trap in Supervisory mode) 


epsr format description, 2.2.4 


BSR (boundary scan register) 
test, 6.2 


bte (Branch If Equal) 
instruction definition, 10.1 
instruction timing, 10.3 


btne (Branch If Not Equal) 
instruction timing, 10.3 


buffer 
models, 9.4 
size, selection with PEN#, 4.2.35, 5.5, 9.4.3 


burst cycles 
bus cycle, 5.1.2 


bus arbitration 
bus operation, 5.2 
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function of, 1.0 


bus cycles 


back-off and restart, 5.2.2 
bus operation, 5.1 
type output pins, 4.1 


bus errors 


bear (bus error address register), 2.2.10 
trap, 2.8.7 


bus operation 


i860 XP microprocessor, 5.0 


BW (break write) 


debugging i860 XP microprocessor, 2.9 
psr format description, 2.2.3 


BYPASS# (bypass) 


Cc 


signal description, 4.2.9 
TAP encoding, 6.3 


CACHE # (cacheability) 


BE7 #-BEO# (byte enables), 4.2.4 
signal description, 4.2.10 


cache 


address translation, 3.1 
consistency protocol, 3.2.4 
external secondary, 1.0 
inquiry cycles (snooping), 5.3 


_ internal instruction and data, 3.2 


invalidating entries, 3.3 
on-chip, 3.0 
replacement algorithm, 3.2.3 


cacheability 


address translation caches, 3.1. 
consistency, 3.3.4 


calli (Indirect Subroutine Call) 


instruction definition, 10.1 
instruction timing, 10.3 


call (Subroutine Call) 


instruction definition, 10.1 
instruction timing, 10.3 


capture-DR 
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capture-IR 
test state, 6.4.11 


CC (condition code) 
psr format description, 2.2.3 | 


cer (concurrency control register) 
DCCU initialization, 2.5.1 
format description, 2.2.12 


CCUBASE 


cer (concurrency control register), 2.2. 12 


DCCU addressing, 2.5.2 
DCCU initialization, 2.5.1 


CD (cache disable) 


bypassing instruction and data cache, 3.3 7 


page- -table entries (PTEs), 2. 4.4.5 


CLK (clock) 
signal description, 4.2.11 


CO (CCU on) 
ccr (concurrency control register), 2.2.12 


color intensity shading 
pixel formats, 2.1.4 


compatibility 
pipelined cycles, 5.1.3 | 
— software changes, 10.5.1 


concurrency control unit (CCU) 
cer (concurrency control register), 2.2.12 
detached CCU, 2.5 
NEWCURR register, 2.2.13 


consistency | 
address space, 3.3.1 
cacheability, 3.3.4 
instruction cache, 3.3.2 
internal cache, 3.3 
load pipe, 3.3.5 
page table, 3.3.3 
protocol, 3.2.4 

_write-once policy, 3.2.4.2 


control registers 
regisier sei, 2.2 


copy-back policy 
data cache update, 3.2.1.1 


core execution unit 
function of, 1.0 


CS8 (code size 8-bit) 
BE7#-BE0# (byte enables), 4.2.4 
dirbase format description, 2.2.6 


CTRL-format 
instructions, 10.2.2 


CTYP (cycle type) 
signal description, 4.2.12 


current mode 
high vs. normal, 4.2.35, 5.5, 9.3, 9.4.3 


cycles 
— back-off, 5.2.2.1 
burst cycles, 5.1.2 
interrupt acknowledge, 5.1.4 
pipelined, 5.1.3 
restart, 5.2.2.2 
special bus, 5.1.5 


D 


D63-D0 (data pins) 
signal eeeeneN ls 4.2.14 


data access 
fault, 2.8.5 


data cache 

bypassing, 3.3 
~ flushing, 3.3 

function of, 1.0 
operation, 3.2 
organization, 3.2.1 
states, 3.2.4.1 
update policies, 3.2.1.1 


data types 
i860 XP microprocessor, 2.1 


DAT (data access trap) 
debugging i860 XP microprocessor, 2.9 _ 
psr format description, 2.2.3 
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db (data breakpoint register) dirbase (directory base register) 
debugging i860 XP microprocessor, 2.9 | address space consistency, 3.3.1 _ 
format description, 2.2.5 cache replacement algorithm, 3.2.3 


psr BR (break read) and BW (break write), 2.2.3 DCCU initialization, 2.5.1 

format description, 2.2.6 
instruction cache consistency, 3.3.2. 
page directory, 2.4.3 


D bit 
dual-instruction mode, 2.6.2 


D/C# (data/code) | page table consistency, 3.3.3 
signal description, 4.2.13 P (present) bit, 2.4.4.2 

D.C. characteristics disassemblers 
electrical data, 9.2 big endian mode, 2.3 

DCCU (detached concurrency control unit) DI (trap on delayed instruction) 
addressing, 2.5.2 7 epsr format description, 2.2.4 


cer (concurrency control register), 2.2.12 
function of, 1.0 

initialization, 2.5.1 

internals, 2.5.3 DO (detached only) 

cer (concurrency control register), 2.2.12 


DM (dual instruction mode) 
psr format description, 2.2.3 


DCS (data cache size) ; 
epsr format description, 2.2.4 double-precision real 
data type, 2.1.3 


D (dirty) | 
page-table entries (PTEs), 2.4.4.6 double real value 
floating-point registers, 2.1.3 
debugging a has 
i860 XP microprocessor, 2.9 double-shift instruction 


: r SC (shift count), 2.2.3 
deferred-write policy ps (shift count) 


data cache update, 3.2.1.1 DP7-DPO (data parity) 
signal description, 4.2.15 


denormal 
special floating-point values, 2.1.3 DPC (data-path control) | | 
|- ion i 2.6.3 | 
peisaned dual-operation instructions, 6.3 
STAT register description, 2.2.14 DPS (DRAM page size) 


detached CCU dirbase format description, 2.2.6 


i860 XP microprocessor, 2.5 | DS (delayed switch) 


sr format description, 2.2.3 
d.fnop P 


dual-instruction mode, 2.6.2 DTB (directory table base) 


dirbase format description,. 2.2.6 
DID (device identification register) = | 


test, 6.2 | - qual-instruction mode 


llellism, 2.6.2 
DIR parallellism 


virtual address, 2.4.2 dual-operation instructions : 
floating-point, 2.6.3 
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E 


EADS# 
AHOLD (address hold), 4.2.3 


EADS # (external address status) 
signal description, 4.2.16 


epsr (extended processor status register) 
data cache, 3.2.1 _ 
DCCU internals, 2.5.3 
format description, 2.2.4 
page-table entries (PTEs), 2.4.4.3 


EWBE # (external write buffer empty) 
epsr SO (strong ordering), 2.2.4 
signal description, 4.2.17 


exiti-DR 
test state, 6.4.7 


exit1-IR 
test state, 6.4.13 


exit2-DR 
test state, 6.4.9 


— exit2-IR 
test state, 6.4.15 


EXTEST . 
TAP encoding, 6.3 — 


F 

faddp (Add with Pixel Merge) 
instruction definition; 10.1 
instruction timing, 10.3 


fadd.p (Floating-Point Add) 
instruction definition, 10.1 
instruction timing, 10.3 | 


faddz (Add with Z Merge) 
instruction definition, 10.1 
instruction timing, 10.3 


famov.r (Floating-Point Adder Move) 
instruction definition, 10.1 
instruction timing, 10.3 
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fault 
address translation, 2.4.6 
data access, 2.8.5 
floating-point, 2.8.3 
instruction access, 2.8.4 
result exception fault, 2.8.3.1 
source exception fault, 2.8.3.1 


fiadd.w (Long-Integer Add) 
instruction definition, 10.1 
instruction timing, 10.3 


fir (fault instruction register) 
epsr DI (trap on delayed instruction), 2.2.4 
format description, 2.2.7 


fisub.w (Long-Integer Subtract) 
instruction definition, 10.1 
instruction timing, 10.3 


fix.v (Floating-Point to Integer Conversion) 
instruction definition, 10.1 | 
instruction timing, 10.3 


fld.y (Floating-Point Load) 
instruction definition, 10.1 
instruction timing, 10.3 


FLINE# (flush line) 
BOFF# choice, 5.3.5.1 
signal description, 4.2.18 


floating-point 
adder, 1.0 
control unit, 1.0 
fault, 2.8.3 
instruction encoding, 10.2.3 
multiplier, 1.0 
register file, 2.2.2 


flush (Cache Flush) 
cache replacement algorithm, 3.2.3 
dirbase RB (replacement block), 2.2.6 
flushing data cache, 3.3 | 
_ instruction definition, 10.1 
instruction timing, 10.3 
requirements summary, 3.3.6 
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fmlow.dd (Floating-Point Multiply Low) FT (floating-point trap) 
instruction definition, 10.1 | | psr format description, 2.2.3 


instruction timing, 10.3 | 
ftrunc.v (Floating-Point to Integer Conversion). 


fmov.r (Floating-Point Reg-Reg Move) _ | instruction definition, 10.1 
instruction definition, 10.1 | instruction timing, 10.3 


instruction timing, 10.3 ; 
fxfr (Transfer F-P to Integer Register) 


fmul.p (Floating-Point Multiply) instruction definition, 10.1 
instruction definition, 10.1 instruction timing, 10.3 | 


instruction timing, 10.3 . 7 
| is fzchkl (32-Bit Z-Buffer Check) 


fnop (Floating-Point No Operation) | | instruction definition, 10.1 
instruction definition, 10.1 : instruction timing, 10.3 


instruction timing, 10.3 | oe 
“ fzchks (16-Bit Z-Buffer Check) 


form (OR with MERGE Register) instruction definition, 10.1 
instruction definition, 10.1 | | instruction timing, 10.3 
instruction timing, 10.3 


FZ (flush zero) 


frep.p (Floating-Point Reciprocal) ; fsr format description, 2.2.8 | 
instruction definition, 10.1 | oe | 
instruction timing, 10.3 | _ G 

frsqr.p (Floating-Point Reciprocal Square Root) graphics unit 
instruction definition, 10.1 | function of, 1.0 


instruction timing, 10.3 


H 
fsr (floating-point status register) 


format description, 2.2.8 
pipelining status information, 2.6.1.2 


hardware interface : : 
i860 XP microprocessor, 4.0 


HIT # (cache inquiry hit) 


fst.y (Floating-Point Store) ; : 
signal description, 4.2.19 


instruction definition, 10.1 | 
instruction timing, 10.3 : HITM # (hit modified line) 

: internal cache consistency, 3.3 _ 
signal description, 4.2.20 


fsub.p (Floating-Point Subtract) 
instruction definition, 10.1 — 7 | | 
instruction timing, 10.3 —- -HLDA (bus hold acknowledge) 
FTE (floating-point trap enable) a signal description, al 
fsr format description, 2.2.8 | . HOLD (bus hold) 
| | | bus arbitration, 5.2 


signal description, 4.2.22 — 
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_ 1860 XP microprocessor 

bus operation, 5.0 
functional description, 1.0 
hardware interface, 4.0 
instruction set, 8.0 
mechanical data, 7.0 
on-chip caches, 3.0 
programming interface, 2.0 
testability, 6.0 


IAT (instruction access trap) 
psr format description, 2.2.3 


IDCODE 
TAP encoding, 6.3 


IEEE Standard | 
for Binary Floating-Point Arithmetic, 1.0 — 
P1149.1/D6 testability, 6.0 


IL (interlock) 
epsr format description, 2.2.4 


IM (interrupt mode) 
_psr format description, 2.2.3 


indefinite | 
special floating-point values, 2.1.3 © 


inexact result 
~ result exception fault, 2.8.3.2 ~ 
initialization. 
at RESET, 5.5 
infinity a 
special floating-point values, 2.1.3 


IN (interrupt) . 
psr format description, 2.2.3 


_InLoop . 
STAT register description, 2.2.14 


-inquiry cycles — 

data cache states, 3.2.4.1 

for line being cached, 5.3.2.1 
for line being replaced, 5.3.2.2 
snooping, 5.3 

write-back, 5.3.1 
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instruction 
‘access fault, 2.8.4 
characteristics, 10.4 
CTRL-format, 10.2.2 
definitions, 10.1 a 
dual-operation, 2.6.3 | 
encoding floating-point, 10.2.3 
fault, 2.8.2 2 | 
format and encoding, 10.2 
REG-format, 10.2.1 
timing, 10.3 


instruction cache 
bypassing, 3.3 
consistency, 3.3.2 
function of, 1.0 
operation, 3.2 
organization, 3.2.2 


instruction set 


abbreviations, 10.0 
extensions of i860 XR, 2.6 | 
i860 XP microprocessor, 8.0 


INT/CS8 (interrupt/code-size 8-bits) 
signal description, 4.2.24 | 
integer _ 


‘data type, 2.1.1 
register file, 2.2.1 


internal cache 
consistency, 3.3 


interrupt 
acknowledge cycles, 5.1.4 | 
i860 XP microprocessor, 2.8 
trap, 2.8.8 ee 


INT (interrupt) | oe 
epsr format description, 2.2.4 _ 


intovr (Software Trap on Integer Overflow) 
instruction definition, 10.1 
instruction timing, 10.3 


INT pin 
epsr INT (interrupt), 2.2.4 
psr IM (interrupt mode), 2.2.3 
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invalidation requirements , Id.c (Load from Control Register) 
summary, 3.3.6 | | | fir (fault instruction register), 2.2.7 
instruction definition, 10.1 


INV (invalidate 
( ) instruction timing, 10.3 


signal description, 4.2.23 
Idint.x (Load Interrupt Vector) 


IR (instruction register) big endian mode, 2.3 


test, 6.3 
_epsr BE (big endian), 2.2.4 
IRP (integer graphics) extensions of i860 XR, 2.6 
fsr format description, 2.2.8 . instruction definition, 10.1 


instruction timing, 10.3 
ITI (cache and TLB invalidate) ee 


dirbase format description, 2.2.6 Idio.x (Load I/O) 
big endian mode, 2.3 
extensions of i860 XR, 2.6 
| instruction definition, 10.1 
ixfr (Transfer Integer to F-P Register) instruction timing, 10.3 
instruction definition, 10.1 | 
instruction timing, 10.3 


IT (instruction trap) 
psr format description, 2.2.3 


Id.l 
flushing data cache, 3.3 


K | | - Id.x (Load Integer) 
KBO, KB1 (cache block) | _  DCCU internals, 2.5.3 
signal description, 4.2.25 7 instruction definition, 10.1 


instruction timing, 10.3 
KEN # (cache enable) 


BE7#-BE0# (byte enables), 4.2.4 

bypassing instruction and data cache, 3.3 | 
DCCU addressing, 2.5.2 LFBSR (linear feedback shift register) 
internal instruction and data caches, 3.2 cache replacement algorithm, 3.2.3 
locked access, 3.2.4.3 
signal description, 4.2.26 


LEN (data length) 
signal description, 4.2.27 


little endian mode 


addressing, 2.3 
KI 


special purpose register description, 2.2.9 load pipe 


consistency, 3.3.5 
KNF (kill next floating-point instruction) 


psr format description, 2.2.3 LOCK # (address lock) 


| A (accessed) bit, 2.4.4.6 
KR | 7  eycle attribute, 5.4 
special purpose register description, 2.2.9 | dirbase BL (bus lock), 2.2.6 
eh -— signal description, 4.2.28 ~ 
L | 
LB (late back-off mode) | 
dirbase format description, 2.2.6 


lock (Begin Interlocked Sequence) 
dirbase BL (bus lock), 2.2.6 
ah instruction definition, 10.1 
LCC (loop condition code) | instruction timing, 10.3 
psr CC (condition code), 2.2.3 locked access, 3.2.4.3 
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locked access 
cache comet 3.2.4.3 


lock instruction 
epsr IL (interlock), 2.2. 4 


lock protocol 
instruction fault, 2.8.2.1 


LRPO (load pipe result precision) 
fsr format description, 2.2.8 


LRP1 (load pipe result precision) 


fsr format description, 2.2.8 


M 
MA 
fsr U- sh (update a 2.2.8 


mechanical data | 
i860 XP microprocessor, 7.0 


MERGE 


special purpose register description, 2.2.9 | 


MESI ae , 
cache consistency atoldcol 3.2. A 
write cycle reordering, § 5.3. - 


MI 
fsr “i bit (update bit), 2. 2. 8 


M/lO# (inemonv©) 
' signal description, 4.2.29 


MO a 

fsr U-bit (update bit), 2.2.8 | 

mov (Constant-to-Register Move) 
instruction definition, 10.1 

mov (Register-Register Move) 
instruction definition, 10:1 


instruction timing, 10.3. - 


MU “ae te hora. Bel 
fsr U-bit (update bit), 2.2.8, 
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NA# (next address request) 
locked access, 3.2.4.3 
signal description, 4.2.30 

_write-once policy, 3.2.4.2 


NaN (Not a Number) 
special floating-point values, 2.1.3 


NENE# (next near) 
dirbase DPS (DRAM page size), 2.2.6 
signal description, 4.2.31 


Nested 
STAT register description, 2.2.14 


NEWCURR register 
DCCU internals, 2.5.3 | 
format description, 2.2.13 


nonpipelined cycle 
bus cycle, 5.1.3 


nop (Core-Unit No Operation) 
instruction definition, 10.1 
instruction timing, 10.3 


O 


offset 


addressing modes, 2.7 
virtual address, 2.4.2 : 


OF (overflow flag) - 
epsr format description, 2.2.4 


on-chip caches - 
i860 XP microprocessor, 3.0 


ordinal 
data type, 2.1.2 

orh (Logical OR’ High) 
instruction definition, 10.1 
instruction timing, 10.3 


or (Logical OR) 
instruction definition, 10.1 
instruction timing, 10.3 
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output pins PBM (page-table bit mode) 

pins overview, 4.1 7 epsr format description, 2.2.4 
overflow - PCD (page cache disable) 

result exception fault, 2.8.3.2 bypassing instruction and data cache, 3.3 

CD (cache disable), 2.4.4.5 

P signal description, 4.2.32 
package PCHK # (parity check) 

thermal specifications, 8.0 signal description, 4.2.33 
PAGE PCYC (page cycle) 

virtual address, 2.4.2 signal description, 4.2.34 
page directory PEF (parity error flag) 

little endian mode, 2.3 | epsr format description, 2.2.4 


page tables, 2.4.3 


PEN # (parity enable) 


paged virtual-address space bear (bus error address register), 2.2.10 
addressing, 2.3 parity error trap, 2.8.6 

sage areas signal description, 4.2.35 
address, 2.4.4.1 | performance optimizations 
physical main memory, 2.4.1 software compatibility, 10.5.2 

page table pfaddp (Pipelined Add with Pixel Merge) 
combining protection, 2.4.4.8 instruction definition, 10.1 
consistency, 3.3.3 : instruction timing, 10.3 


entry format description, 2.4.4 
format description, 2.4.3 

little endian mode, 2.3 

for trap handlers, 2.4.4.7 


pfadd.p (Pipelined Floating-Point Add) 
instruction definition, 10.1 
instruction timing, 10.3 


pfaddz (Pipelined Add with Z Merge) 
instruction definition, 10.1 
instruction timing, 10.3 


paging unit | 
address translation caches, 3.1 
function of, 1.0 


pfamov.r (Pipelined Floating-Point Adder Move) 


parallelism 
dual-instruction mode. 2.6.2 instruction definition, 10.1 
use of. 2.6 _ instruction timing, 10.3 


pfam.p (Pipelined Floating-Point Add and Multiply) 


bear (bus error address register), 2.2.10 dual-operation, 2.6.3 


psr IM (interrupt mode), 2.2.3 instruction definition, 10.1 
trap, 2.8.6 7 instruction timing, 10.3 


- special purpose registers, 2.2.9 


parity error 


pause-DR | a - 
test state. 6.4.8 pfeq.p (Pipelined Floating-Point Equal Compare) 
oe instruction definition, 10.1 
pause-IR instruction timing, 10.3 


test state, 6.4.14 
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inte 


pfgt.p (Pipelined Floating-Point Greater-Than. 
Compare) 


instruction definition, 10.1 
instruction timing, 10.3 


pfiadd.w (Pipelined Long-Integer Add) 
instruction definition, 10.1 
instruction timing, 10.3 


pfisub.w (Pipelined Long-Integer Subtract) 
instruction definition, 10.1 
instruction timing, 10.3 
pfix.v (Pipelined Floating-Point to Integer 
Conversion) 
instruction definition, 10.1 
‘instruction timing, 10.3 


pfid (Pipelined Floating-Point Load) 
epsr PT (trap on pipeline use), 2.2.4 
load pipe consistency, 3.3.5 
pipeline loads, 2.6.1.5 


pfid.q 
extensions of i860 XR, 2.6 


pfid.y (Pipelined Floating-Point Load) 
- instruction definition, 10.1 
‘instruction timing, 10. 3 


pfle.p (Pipelined F-P Less- Than or equal Sanaa 
instruction definition, 10.1. 
instruction timing, 10.3 | 


pfmam.p (Pipelined Floating-Point Add and Multiply) 
dual operation, 2.6.3 | 
instruction definition, 10.1 
instruction timing, 10.3 
special purpose registers, 2. 2. 9 


pfmov.r (Pipelined Floating-Point Reg-Reg Move 
instruction definition, 10.1 
instruction timing, 10.3 


pfmsm.p (Pipelined Floating-Point Subtract 
and Multiply) 


dual operation, 2.6.3 
instruction definition, 10.1 — 
instruction timing, 10.3 

special purpose registers, 2.2.9 
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pfmul3.dd (Three-Stage Pipelined Multiply 
instruction definition, 10.1 
instruction timing, 10.3 


pfmul.p (Pipelined Floating-Point Multiply) — 
instruction definition, 10.1 
instruction timing, 10.3 


_ pform (Pipelined OR to MERGE Register) 


instruction definition, 10.1 
instruction timing, 10.3 


pfsm.p (Pipelined Floating-Point Subtract 
and Multiply) 


dual-operation, 2.6.3 
instruction definition, 10.1 
instruction timing, 10.3 

special purpose registers, 2.2.9 


| pfsub.p (Pipelined Floating-Point Subtract) 


instruction definition, 10.1 
instruction timing, 10.3 
pftrunc.v (Pipelined Floating-Point to 
Integer Conversion) 
instruction definition, 10.1 
instruction timing, 10.3 


pfzchkl (Pipelined 32-Bit Z-Buffer Check) 
instruction definition, 10.1 
instruction timing, 10.3 


pfzchks (Pipelined 16-Bit Z-Buffer Check) 
instruction definition, 10.1 
instruction timing, 10.3 


_ physical main memory 


page frame, 2.4.1 


physical tags 


internal instruction and data caches, 3.2 


PI bit 
using, 2.8.2.2 


PIM (previous interrupt mode) 
psr format description, 2.2.3 


pins overview 
hardware interface, 4.1 
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pipeline PWT (page write-through) 
cycles, 5.1.3 signal description, 4.2.36 
loads, 2.6.1.5 WT (write-through), 2.4.4.4 
operations, 2.6.1 
precision in, 2.6.1.3 R 


scalar transition, 2.6.1.4 


ratings 
status information, 2.6.1.2 


absolute maximum, 9.1 
PI (pipeline instruction) 


RB (replacement block) 
epsr format description, 2.2.4 


dirbase format description, 2.2.6 
pixel | 


RC (replacement control) 
data type, 2.1.4 


dirbase format description, -2.2.6 | 
PM (pixel mask) | 
psr format description, 2.2.3 


REG-format 
instructions, 10.2.1 


P (present) 


register cell ordering 
page-table entries (PTEs), 2.4.4.2 


boundary scan, 6.5 
privileged registers 


replacement algorithm 
format description, 2.2.11 


cache, 3.2.3 
i eta RESET (system reset) 
fevisiens ee AHOLD (address hold), 4.2.3 _ 
type, 2.2.4 on 
bear (bus error address register), 2.2.10 
programming interface | | cache replacement algorithm, 3.2.3 
i860 XP microprocessor, 2.0 epsr BEF (bus error flag), 2.2.4 


epsr SO (strong ordering), 2.2.4 
initialization, 5.5 
signal description, 4.2.37 


PS (pixel size) | 
psr format description, 2.2.3 


psr (processor status register) trap, 2.8.9 
debugging i860 XP microprocessor, 2.9 
format description, 2.2.3 
page-table entries (PTEs), 2.4.4.3 


restart 
bus cycle, 5.2.2 


result exception fault 


pst.d (Pixel Store) floating-point, 2.8.3.1 


instruction definition, 10.1 


instruction timing, 10.3 | right-shift instruction 

psr PS (pixel size) and PM (pixel mask), 2.2.3 psr SC (shift count), 2.2.3 
PT (trap on pipeline use) . ‘RM (rounding mode) | 

epsr format description, 2.2.4 | fsr format description, 2.2.8 

using, 2.8.2.2 


RR (result register) | 
PU (previous user mode) | fsr format description, 2.2.8 


psr format description, 2.2.3 run-test/idle 


test state, 6.4.2 
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S Pea. | ‘shr (Shift Right) 
: : | : instruction definition, 10.1 
instruction timing, 10.3 


SAMPLE | 
TAP encoding, 6.3 
signal description 


ae hardware interface, 4.2 
mode, 2.6.1.1 ee 
operations, 2.6.1 : | — single-precision real 
pipelined transition, 2.6.1.4 data type, 2.1.3 

SC (shift count) , | ; | P single-transfer cycle 
psr format description, 2.2.3 bus cycle, 5.1.1 

SCYC.X (Special Cycles) _ - SI (sticky inexact) 
big endian mode, 2.3 | fsr format description, 2.2.8 


epsr BE (big endian), 2.2.4 
extensions of i860 XR, 2.6 
instruction definition, 10.1 
instruction timing, 10.3 


snooping 
inquiry cycles, 5.3 
internal instruction and data caches, 3.2 
responsibility limits, 5.3.2 
select-DR-scan - 


ftw mpatibili 
test state, 6.4.3 software compatibility 


| required changes, 10.5.1 
select-IR-scan 


S ° 
test state, 6.4.4 O (strong ordering) 


epsr format description, 2.2.4 
serializing 


| source exception fault 
locked access, 3.2.4.3 P | 


floating-point, 2.8.3.1 
SE (source exception) 


spare 
fsr format description, 2.2.8 ? 


signal description, 4.2.38 
shift-DR | 


special bus 
test state, 6.4.6 P 


cycles, 5.1.5 | 
shift-IR | | 


special-purpose registers — 
test state, 6.4.12 p purp g 


| register set, 2.2 
shl (Shift Left) 
_ instruction definition, 10.1 
instruction timing, 10.3 


special values 
floating-point numbers, 2.1.3 — 


STAT register | - 
DCCU internals, 2.5.3 
format description, 2.2.14 


shra (Shift Right Arithmetic) — 
instruction definition, 10.1 
instruction timing, 10.3 


shrd (Shift Right Double) 
instruction definition, 10.1 
‘instruction timing, 10.3 


2-160 


intel. _ i860™ XP MICROPROCESSOR PRELIMINARY 


st.c (Store to Control Register) TAI (Trap On Autoincrement) 
address translation, 2.4 fone epsr format description, 2.2.4 
dirbase BL (bus lock), 2.2.6 . fsr U-bit (update bit), 2.2.8 


dirbase CS8 (code size 8-bit), 2.2.6 
fsr U-bit (update bit), 2.2.8 
instruction definition, 10.1 
instruction timing, 10.3 

privileged registers, 2.2.11 


TAP (test access port) 
controller, 6.4 
controller initialization, 6.6 
testability, 6.0 


TCK (test clock) 


stepping number | 
pping signal description, 4.2.39 


epsr format description, 2.2.4 
TDI (test data input) 


stio.x (Store I/O 
i0.x ( ) signal description, 4.2.40 


big endian mode, 2.3 | , 

epsr BE (big endian), 2.2.4 TDO (test data output) 
extensions of i860 XR, 2.6 ne signal description, 4.2.41 
instruction definition, 10.1 


age oo test | 
instruction timing, 10.3 j 
architecture, 6.1 
_ strong ordering mode | | 73 data registers, 6.2 
inquiry cycle, 5.3.4 | - 
aes | testability 
st.x (Store Integer) | | i860 XP microprocessor, 6.0 - 


DCCU internals, 2.5.3 
instruction definition, 10.1 
instruction timing, 10.3 


test-logic-reset 
test state, 6.4.1 


test state 
capture-DR, 6.4.5 
capture-IR, 6.4.11 


_ subs (Subtract Signed) 
epsr OF (overflow flag), 2.2.4 
instruction definition, 10.1 


instruction timing, 10.3 ened 

: exit1-IR, 6.4.13 

subu (Subtract Unsigned) exit2-DR, 6.4.9 

epsr OF (overflow flag), 2.2.4 :  exit2-IR, 6.4.15 
instruction definition, 10.1 - =e" pause-DR, 6.4.8 


_instruction timing, 10.3 7 — 3 pause-|IR, 6.4.14 
a run-test/idle, 6.4.2 

select-DR-scan, 6.4.3 | 
select-IR-scan, 6.4.4 

_ shift-DR, 6.4.6 
shift-IR, 6.4.12 | 
test-logic-reset, 6.4.1 
update-DR, 6.4.10 | 
update-IR, 6.4.16 


supervisor/user mode 
addressing, 2.3 
ccr (concurrency control register), 2.2.12 
psr U (user mode), 2.2.3 


+ 
special purpose register description, 2.2.9 
tags | 


internal instruction and data caches, 3.2 thermal specifications 


package, 8.0 
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TI (trap inexact) ~ 7 —— update-IR 


fsr format description, 2.2.8 — | test state, 6.4.16 
TLB | ne ae user/supervisor mode 
address translation caches, 3.1. . cer (concurrency control register), 2. 2. 12 
DCCU addressing, 2.5.2 psr U (user mode), 2.2.3 
internal cache consistency, 3.3. | 
y | U (user) 
TMS (test mode select) . ts page-table entries (PTEs), 2.4.4. 3 
signal description, 4.2.42 _ * < 7 psr format description, 2.2.3 
trap handler | ; | 
i V 


invocation, 2.8.1 8 | 
page tables, 2.4.4.7 oe VccCLk (clock power) 


signal description, o 2. 45 
trap (Software Trap) , 


' bus error, 2.8.7 | Et 2, Voc (system ground) : 
i860 XP microprocessor, 2.8 | signal description, 4.2.44 
instruction cache consistency, 3.3.2 “itialaddress 


(etichon-Genauens\O:1 i 2h ty address translation caches, 3.1 

instruction timing, 10.3: ) CCUBASE. 2.2.12 | 
interrupt, 2.8.8 | . 
_ parity error, 2.8.6 


RESET, 2.8.9 


format description, 2.4.2 
i860 XP microprocessor, 2.4 


virtual tag ) 
_ tri-state ! rn instruction cache, 3.2. 2 

output pins, 4.1 internal instruction and data caches, 3 2 
TRST # (test reset) 


Vss (ground) 
signal description, 4.2.43 


signal description, 4.2.44 | 
U 


U-bit (update bit) — 
fsr format description, 2.2.8 


W 


wait state 
single-transfer cycle, 5.1.1 
underflow 


WB/WT # (write-back/write-through) 
result exception fault, 2.8.3. 2 


signal description, 4.2.46 _ 

unlock (End Interlocked Sequence) write-once policy, 3.2.4.2 
dirbase BL (bus lock), 2.2.6 | | 
epsr IL (interlock), 2.2.4 
instruction definition, 10.1 
instruction timing, 10.3 


WP (write protect) 7 
epsr format description, 2.2.4 
page-table entries (PTEs), 2.4.4.3 


W/R # (write/read) | 
signal description, 4.2.47 
write-once policy, 3.2.4.2 


update-DR | | 
test state, 6.4.10 
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write-back | X 
data cache update policy, 3.2.1.1 
with FLINE#, 5.3.5.2 
inquiry cycles, 5.3.1 
scheduling inquiry cycles, 5.3.5 


xorh (Logical Exclusive OR High) 
instruction definition, 10.1 
instruction timing, 10.3 


xor (Logical Exclusive OR) 
instruction definition, 10.1 
instruction timing, 10.3 


write cycle 
reordering due to buffering, 5.3.3 


write-once 
cache consistency, 3.2.4.2 : | Zz 
data cache update policy, 3.2.1.1 | 2-buffer 
write-through | ; special purpose registers, 2.2.9 


data cache update policy, 3.2.1.1 


WT (write-through) 
page-table entries (PTEs), 2.4.4.4 
write-through policy, 3.2.1.1 


W (writable) 
page-table entries (PTEs), 2.4.4.3 


2-163 


intel. 


Parallel Architecture that Supports Up — 


| PRELIMINARY 


i860™ XR 64-BIT MICROPROCESSOR 


Compatible with Industry Standards | 


B 
to Three Operations per Clock — ANSI/IEEE Standard 754-1985 for | 
— One Integer or Control Instruction Binary Floating-Point Arithmetic — 
per Clock ae ee i — Intel386™/486T Microprocessor 
— Up to Two Floating-Point Results per Data Formats and Page Table Entries 
Clock Se ae a — JEDEC 168-pin Ceramic Pin Grid 
High Performance Design _ Array Package (see Packaging 
— 25/33.3/40 MHz Clock Rates — aaa 
— 80 Peak Single Precision MFLOPs | 9) 
— 60 Peak Double Precision MFLOPs m Easy to Use | 


— 64-Bit External Data Bus a 
— 64-Bit Internal Instruction Cache Bus 
— 128-Bit Internal Data:'Cache Bus © 


High Level of Integration on One Chip 


—On-Chip Debug Register _ 

— Assembler, Linker, Simulator, 
Debugger, C and FORTRAN |. 
Compilers, FORTRAN Vectorizer, . 


Scalar and Vector Math Libraries for 


— 32-Bit Integer and Control Unit both OS/2* and UNIX* Environments. 


— 32/64-Bit Pipelined Floating-Point 
Adder and Multiplier Units 
— 64-Bit 3-D Graphics Unit 
— Paging Unit with Translation 
Lookaside Buffer 
— 4 Kbyte Instruction Cache 
- —8 Kbyte Data Cache 


The Intel i8607™M XR Microprocessor (order codes A80860XR-25, A80860XR-33 and A80860XR-40) delivers 
supercomputing performance in a single VLSI component. The 64-bit design of the i860 XR microprocessor 
_ balances integer, floating point, and graphics performance for applications such as engineering workstations, 
scientific computing, 3-D graphics workstations, and multiuser systems. Its parallel architecture achieves high 
throughput with RISC design techniques, pipelined processing units, wide data paths, large on-chip caches, 
-million-transistor design, and fast one-micron CHMOS IV silicon technology. 
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1.0 FUNCTIONAL DESCRIPTION | 


As shown by the block diagram on the front page, 
the i860 XR microprocessor consists of 9 units: © 


. Core Execution Unit | 
. Floating-Point Control Unit 

. Floating-Point Adder Unit — 
. Floating-Point Multiplier Unit 
. Graphics Unit 

. Paging Unit 

. Instruction Cache 

. Data Cache 

. Bus and Cache Control Unit 


CONDOR WDw— 


The core execution unit controls overall operation of 
the i860 XR microprocessor. The core unit executes 
load, store, integer, bit, and control-transfer opera- 


tions, and fetches instructions for the floating-point . 


unit as well. A set of 32 x 32-bit general-purpose 
registers are provided for the manipulation of integer 
data. Load and store instructions move 8-, 16-, and 
32-bit data to and from these registers. Its full set of 
integer, logical, and control-transfer instructions give 
the core unit the ability to execute complete systems 
software and applications programs. A trap mecha- 
nism provides rapid response to exceptions and ex- 
ternal interrupts. Debugging is supported by the abili- 
ty to trap on data or instruction reference. | 


The floating-point hardware is connected to a sepa- 
rate set of floating-point registers, which can be 
accessed as 16 x 64-bit registers, or 32 x 32-bit reg- 
isters. Special load and store instructions can also 
access these same registers as 8 x 128-bit registers. 
All floating-point instructions use these registers as 
their source and destination operands. 


The floating-point contro! unit controls both the float- 
ing-point adder and the floating-point multiplier, issu- 
ing instructions, handling all source and_ result 
exceptions, and updating status bits in the floating- 
point status register. The adder and multiplier can 
operate in parallel, producing up to two results per 
clock. The floating-point data types, floating-point in- 
structions, and exception.handling all support the 
IEEE Standard for Binary Floating- -Point Arithmetic 
(ANSI/IEEE Std 754-1985). 


The floating-point adder performs addition, subtrac- 
‘tion, comparison, and conversions on 64- and 32-bit 
_ floating-point values. An adder instruction executes 
_ in three clocks; however, in pipelined mode, a new 
result is peneidied every clock. 


The floating- point multiplier performs floating- point 
and integer multiply and floating-point reciprocal op- 
erations on 64- and 32-bit floating-point vaiues. A 
multiplier instruction executes in three to four clocks; 


i860™ XR MICROPROCESSOR 


The programmer-visible aspects ofthe ar 


PRELIMINARY 


however, in pipelined mode, a new result can be 
generated every clock for single- -precision and every 
other clock for double precision. 


The graphics unit has special integer logic that sup- 
ports three-dimensional drawing in a graphics frame 
buffer, with color intensity shading and hidden sur- 
face elimination via the Z-buffer algorithm. The 
graphics unit recognizes the pixel as an 8-, 16-, or 
32-bit data type. It can compute individual red, blue, 
and green color intensity values within a pixel; but it 
does so with parallel operations that take advantage 
of the 64-bit internal word size and 64-bit external 
bus. The graphics features of the i860 XR micro- 
processor assume that the surface of a solid object 
is drawn with polygon patches whose shapes ap- 
proximate the original object. The color intensities of 
the vertices of the polygon and their distances from 
the viewer are known, but the distances and intensi- 
ties of the other points.must be calculated by inter- 
polation. The graphics instructions of the i860 XR 
microprocessor directly aid such interpolation. 


The paging unit implements protected, paged, virtual 
memory via a 64-entry, four-way set-associative 
memory called the TLB (Translation Lookaside Bufft- 
er). The paging unit uses the TLB to perform the 
translation of logical address to physical address, 
and to check for access violations: The access pro- 
tection scheme employs two levels of privilege: user 
and supervisor. 


The instruction cache is a two-way set-associative 
memory of four Kbytes, with 32-byte blocks. It trans- 
fers up to 64 bits per clock (320 Mbyte/sec at 
40 MHz). 


The data cache is a two-way set-associative memo- 


ry of eight Kbytes, with 32-byte blocks. It transfers 
up to 128 bits per clock (640 Mbyte/sec at 40 MHz). 
The i860 XR microprocessor normally uses write- 
back caching, i.e. memory writes update the cache 
(if applicable) without necessarily updating memory 
immediately; however, caching can be inhibited by 
software where necessary. 


The bus and cache control unit performs data and 
instruction accesses for the core unit. It receives cy- 
cle requests and specifications from the core unit, 
performs the data-cache or instuction-cache miss 
processing, controls TLB translation, and provides 
the interface to the external bus. Its pipelined struc- — 
ture supports up to three outstanding bus cycles. 


2.0 PROGRAMMING INTERFACE 


chitecture 
of the i860 XR microprocessor include data types, 
registers, instructions, and traps. 
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2.1 Data Types 


The i860 XR microprocessor provides operations for 
integer and floating-point data. Integer operations 
are performed on 32-bit operands with some support 
also for 64-bit operands. Load and store instructions 
can reference 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit 


operands. Floating-point operations are performed 


on IEEE-standard 32- and 64-bit formats. Graphics 
oriented instructions operate on arrays of 8-, 16-, or 
32-bit pixels. 


2.1.1 INTEGER 


An integer is a 32-bit signed value in standard two’s 
_ complement form. A 32-bit integer can represent a 
value in the range —2,147,483,648 (—231) to 
2,147,483,647 (+231 — 1). Arithmetic operations on 
8- and 16-bit integers can be performed by sign-ex- 
tending the 8- or 16-bit values to 32 bits, then using 
the 32-bit operations. 


There are also add and subtract instructions that op- 
erate on 64-bit long integers. 


Load and store instructions may also reference (in 
addition to the 32- and. 64-bit formats previously 
mentioned) 8- and 16-bit items in memory. When an 
8- or 16-bit item is loaded into a register, it is con- 
verted to an integer by sign-extending the value to 
32 bits. When an 8- or 16-bit item is stored from a 
register, the corresponding number of low- order bits 
of the register are used. | 
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2.1.2 ORDINAL 


Arithmetic operations are available for 32-bit ordi- 
nals. An ordinal is an unsigned integer. An .ordinal 
can represent values in the’ range Oto 
4,294,967,295 (+ 232 — 1). > : 


Also, there are add and subtract instructions that op- 
erate on 64-bit ordinals. 


2.1.3 SINGLE- AND DOUBLE-PRECISION REAL 


Figure 2.1 shows the real number formats. A single- 
precision real (also called “single real’) data type is 
a 32-bit binary floating-point number. Bit 31 is the 
sign bit; bits 30..23 are the exponent; and bits 22..0 


are the fraction. In accordance with ANSI/IEEE }agaaim 


standard 754, the value of a ne: precision real is 
defined as follows: 


1.lfe = 0 andf # 0 ore = 255 then senses a 
floating-point source-exception trap when en- 
countered in a floating-point operation. 


2.1f0 <e < 255, anon en aie ee 1)s x Af x 
ge-127, 


3. lfe = Oandf = 0, then the (ale: is = signed: zero. 


A double-precision real (also called “double real’’) 
data type is a 64-bit binary floating-point number. Bit 
63 is the sign bit; bits 62..52 are the exponent; and 
bits 51..0 are the fraction. In accordance with ANSI/ 
IEEE standard 754, the value of a double- -precision 
real is defined as follows: 
1. Ife = Oandf # Oore = 2047, then generate a 
floating- point source-exception trap when en- 
countered in a floating-point operation. | 


2. If 0 < e < 2047, then the value is (—1)S X 1.f x 
9e—1023. 


Single-Precision Real 


FRACTION 
EXPONENT 
SIGN 
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Double-Precision Real 


FRACTION - 
EXPONENT 
SIGN 
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Figure 2.1. Real Number Formats 
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3. If e = 0 and f = 0, then the value is signed zero.. 


The special values infinity, NaN (“Not a Number”), 
indefinite, and denormal generate a trap when en- 
countered. The trap handler EAP lene lEEE- stan- 
dard results. 


A double real value occupies an even/odd pair of 
_ floating-point registers. Bits 31..0 are stored in the 
even-numbered floating-point register; bits 63..32 
are stored in the next higher odd- numbered floating- 
point register. | 


2. 1.4 PIXEL 
A pixel may be 8, 16, or 32 bits ing depending on 


~.. Color and intensity resolution requirements. Regard- 


less of the pixel size, the i860 XR microprocessor 
~ always operates on 64 bits worth of pixels at a time. 
The ane data type is used by two. kinds ot instruc- 
tions: 


@ The selective seiginks instruction that halpa i im- 
plement hidden surface elimination. 


e The pixel add. instruction that helps implement 
3- ‘D color intensity shading. 


To perform color intensity shading efficiently in a va- 
riety of applications, the i860 XR microprocessor de- 
fines three pixel formats according to Table 2.1. 


Figure 2.2 illustrates ¢ one way of assigning meaning 
to the fields of pixels. These assignments are for 
illustration purposes only. The i860 XR microproces- 
sor defines only the field sizes, not the specific use 
of each field. Other ways of using the fields of pixels 
are possible. 


32=BIT PIXEL 
Sl 
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Table 2.1. Pixel Formats 


Bits of 
Other 


Attribute 
(in bits) | Intensity | Intensity i (Texture) 


The intensity attribute fields may be assigned to:colors in 
any order convenient to the application. 


*With 8-bit pixels, up to 8 bits can be used for intensity; the 
remaining bits can be used for any other attribute, such as 
color. The intensity bits must be the low-order bits of the 


Be 


2.2 Register § Set 


As Figure 2.3 shows, the ig6o XR microprocessor 


has the following registers: 
e An integer register file 
eA floating- point register. file 


e Six control registers (psr, epsr, db, dirbase, i ANE 
¥ and fsr) | | 


Four special-purpose registers (KR, Kl, T, and 
_ MERGE) | 


The control registee are accessible only by load: 
and store control-register instructions; the integer 
and floating- point registers are accessed by arithme- 
tic operations and load and store instructions. The 
special-purpose registers KR, KI, T, and MERGE are 
used by a few specific instructions. 


as : en ae 30° 


9 : 3 0 
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i icneley R—Red intensity, G—Green intensity, B—Blue intensity, C—Color, T—Texture 
These assignments of specific meanings to the fields of pixels are for illustration purposes only. Only the field sizes are - 


defined, not the specific use of each field. 


Figure 2.2. Pixel Format Example 
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2.2.1 INTEGER REGISTER FILE 


There are 32 integer registers, each 32 bits wide, 
referred to as r0 through r31, which are used for 
address computation and scalar integer computa- 
tions. Register rO always returns zero when read, 
independently of what is stored in it. 


2.2.2 FLOATING-POINT REGISTER FILE 


There are 32 floating-point registers, each 32-bits 
wide, referred to as f0 through f31, which are used 
for floating-point computations. Registers f0 and f1 
always return zero when read, independently of 
what is stored in them. The floating-point registers 
are also used by a set of graphics operations, pri- 
marily for 3D graphics computations. 


When accessing 64-bit floating-point or integer val- 
ues, the i860 XR microprocessor uses an even/odd 
pair of registers. When accessing 128-bit values, it 
uses an aligned set of four registers (f0, 74, 78,..., 
#28). The instruction must designate the lowest reg- 


ister number of the set of registers containing 64- or. 


128-bit values. Misaligned register numbers produce 
undefined results. The register with the lowest num- 
ber contains the least significant part of the value. 
For 128-bit values, the register pair with the lower 
numbers contain the least significant 64 bits while 
the register pair with the higher numbers contain the 
most significant 64 bits. 7 
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The 128-bit load and store instructions, along with 
the 128-bit data path between the floating-point reg- 
isters and the data cache help to sustain the extraor- 
dinarily high rate of computation. 


2.2.3 PROCESSOR STATUS REGISTER 


The processor status register (psr) contains miscel- 
laneous state information for the current process. 
Figure 2.4 shows the format of the psr. 


@ BR (Break Read) and BW (Break Write) enable a 
data access trap when the operand address 
matches the address in the db register and a 
read or write (respectively) occurs. : 


° Various instructions set CC (Condition Code) ac- 
cording to tests they perform. The branch-on- 
condition-code instructions test its value. The bla 
instruction sets and tests LCC (Loop Condition 
Code). 


©-|M (Interrupt Mode) enables external interrupts if 


set; disables interrupts if clear. 


o U (User Mode) is set when the i860 XR micro- 
' processor is executing in user mode; it is clear 
when the i860 XR microprocessor is executing in 
supervisor mode. In user mode, writes to some 
control registers are inhibited. This bit also con- 
trols the memory protection mechanism. See 
section 2.4.4.3 for a description of memory pro- 
tection in user and supervisor modes. | 


INTEGER 
REGISTERS 


CONTROL 


REGISTERS 


ADDRESS 


KA MICROPROCESSOR 


EXTERNAL 
MEMORY. 


64 


INSTRUCTION 
CACHE 


32 32 
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‘ 


ADDRESS 


128 


FLOATING POINT 
REGISTERS | 


cesta 


ADDER UNIT 


GRAPHICS UNIT 
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Figure 2.3. Registers and Data Paths 
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DELAYED SWITCH 
DUAL INSTRUCTION MODE ee me a | 


KILL NEXT FLOATING=POINT INSTRUCTION 
(RESERVED) 

SHIFT COUNT 

PIXEL SIZE 

PIXEL MASK 
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*Can be changed only from supervisor level. _ 


Figure 2.4 Processor Status Register 


INTERLOCK 


22 


31 24 
(RESERVED) fee 
= * | s 


WRITE-PROTECT MODE 
DATA CACHE SIZE | | 


18 15 13 8 oO. 
| STEPPING PROCESSOR . 
; : = e s 


| ere (RESERVED) 
PAGE-TABLE BIT MODE 
BIG ENDIAN MODE 


OVERFLOW FLAG 
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*Can be changed only from supervisor level. 


Figure 2.5 Extended Processor Status Register 


° PIM (Previous Interrupt Mode) and PU (Previous 


User Mode) save the corresponding status bits 
(IM and U) on a trap, because those status bits 
are changed when a trap occurs. They are re- 
stored into their corresponding status bits when 
returning from a trap handler with a branch indi- 
rect instruction when a trap flag is set in the psr. 


FT (Floating-Point Trap), DAT (Data Access 
Trap), |AT (Instruction Access Trap), IN (Inter- 
rupt), and IT (Instruction Trap) are trap flags. 
They are set when the corresponding trap condi- 
tion occurs. The trap handler examines these bits 
to determine which condition or conditions have 
caused the trap. 
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° DS (Delayed Switch) is set if a trap occurs during 


the instruction before dual-instruction mode is en- 
tered or exited. If DS is set and DIM (Dual Instruc- 
tion Mode) is clear, the i860 XR microprocessor 
switches to dual-instruction mode one instruction 
after returning from the trap handler. If DS and 
DIM are both set, the i860 XR microprocessor 
switches to single-instruction mode one instruc- | 
tion after returning from the trap handler. 


When a trap occurs, the i860 XR microprocessor 
sets DIM if it is executing in dual-instruction 
mode; it clears DIM if it is executing in single-in- 
struction mode. If DIM is set after returning from a 
trap handler, the i860 XR microprocessor re- 
sumes execution in dual-instruction mode. 
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e When KNF (kill Next Floating-Point Instruction) is 
set, the next floating-point instruction is sup- 
pressed (except that its dual-instruction mode bit 
is interpreted). A trap handler sets KNF if the 
trapped floating-point instruction should not be 
reexecuted. 


SC (Shift Count) stores the shift count used by 
the last right-shift instruction. It controls the num- 
ber of shifts executed by the double-shift instruc- 
tion. 


PS (Pixel Size) and PM (Pixel Mask) ¢ are used by 
the pixel-store instruction and by the graphics in- 
structions. The values of PS control pixel size as 


defined by Table 2.2. The bits in PM correspond 


to pixels to be updated by the pixel-store instruc- 
tion pst.d. The low-order bit of PM corresponds 
to the low-order pixel of the 64-bit source oper- 
and of pst.d. The number of low-order bits of PM 
that are actually used is the number of pixels that 
fit into 64-bits, which depends upon PS. If a bit of 


PM is set, then pst.d stores the corresponding 


pixel. Refer also to the pst.d instruction in section 
8. | 


Table 2.2. Values of PS 


Value Pixel Size Pixel Size 
in bits inbytes 


8 


16 
82 
(undefined) | 


ADDRESS TRANSLATION ENABLE 
DRAM PAGE SIZE 
BUS LOCK 

_1=CACHE, TLB INVALIDATE 
(RESERVED) . 
CODE SIZE 8=BIT 


2.2.4 EXTENDED PROCESSOR STATUS 
REGISTER | 


The extended processor status register (epsr) con- 
tains additional state information for the current pro- 
cess beyond that stored in the Pst rigure 2.5 shows 
the format of the epsr. 


© The processor type is one for the i860 XR micro- 
processor. » 


© The stepping number has a unique value that dis- 
tinguishes among different revisions of the proc- 
essor. 


© |L (Interlock) is set if a trap occurs after a lock 
instruction but before the load or store following 
the subsequent unlock instruction. IL indicates to 
the trap handler that a locked sequence has 
been interrupted. When the trap handler finds IL 
set, it should scan backwards for the lock in- 
struction and restart at that point. The absence of 

_ alock instruction within 30-33 instructions of the 
trap indicates a programming error. 


© WP (write protect) controls the semantics of the 
W bit of page table entries. A clear W bit in either 
the directory or the page table entry causes 
‘writes to be trapped. When WP is clear, writes 
are trapped in user mode, but not in supervisor 
mode. When WP is set, writes are trapped in both 
user and supervisor modes. After the value of the 
WP bit is changed, the TLB must be invalidated 
by setting the ITI bit of the dirbase register, be- | 

_ fore any stores are performed. 


© INT (Interrupt) is the value of the INT input pin. 


© DCS (Data Cache Size) is a read-only field that 
tells the size of the on-chip data cache. The num- 
ber of bytes actually available is 212+ DCS; there- 
fore, a value of zero indicates 4 Kbytes, one indi- 
cates 8 Kbytes, etc. 


REPLACEMENT BLOCK ; ; 
REPLACEMENT CONTROL 2s. gt ma = 


7 

ic | 
DIRECTORY TABLE BASE (DTB) S|X TT 

| 8 | ji 
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_ *Can be changed only from supervisor ivi: 


_ Figure 2.6. Directory Base Register 
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© PBM (Page-Table Bit Mode) determines which bit 
of page-table entries is output on the PTB pin. 
When PBM is clear, the PTB signal reflects bit CD 
of the page-table entry used for the current cycle. 
When PBM is set, the PTB signal reflects bit WT 
of the page-table entry used for the current cycle. 


® BE (Big Endian) controls the ordering of bytes 
within a data item in memory. Normally (i.e. when 
BE is clear) the i860 XR microprocessor operates 
in little endian mode, in which the addressed byte 
is the low-order byte. When BE is set (big endian 
mode), the low-order three bits of all load and 
store addresses are complemented, then 
masked to the appropriate boundary for align- 
ment. This causes the addressed byte to be the 
most significant byte. Section 2.3 discusses little 
and big endian addressing. 


° OF (Overflow Flag) is set by adds, addu, subs, 
and subu when integer overflow occurs. For 
adds and subs, OF is set if the carry from bit 31 
is different than the carry from bit 30. For addu, 
OF is set if there is a carry from bit 31. For subu, 
OF is set if there is no carry from bit 31. Under all 
other conditions, it is cleared by these instruc- 
tions. OF controls the function of the intovr 
instruction. OF cannot be written in user mode 
using ST.C. 


2.2.5 DATA BREAKPOINT REGISTER 


The data breakpoint register (db) is used to gener- 
ate a trap when the i860 XR microprocessor makes 
a data-operand access to the address stored in this 
register. The trap is enabled by BR and BW in psr. 
The db register can only be changed from supervi- 
-sor level. When comparing, a number of low order 
bits of the address are ignored, depending on the 
size of the operand. For example, a 16-bit access 
ignores the low-order bit of the address when com- 
paring to db; a 32-bit access ignores the low-order 
two bits. This ensures that any access that overlaps 
the address contained in the register will generate a 
trap. The DAT occurs before the data is accessed 
and prevents the load or store from completing. 


2.2.6 DIRECTORY BASE REGISTER 


The directory base register dirbase (shown in Figure 
2.6) controls address translation, caching, and bus 
‘options. The dirbase register can only be changed 
from supervisor level. The BL bit is changed from 
user level with the lock and unlock instructions. 


e ATE (Address Translation Enable), when set, en- 
ables the virtual-address translation algorithm. 
The data cache must be flushed before changing 
the ATE bit. . | 


© DPS (DRAM Page Size) controls how many bits 
to ignore when comparing the current bus-cycle 
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address with the previous bus-cycle address to 
generate the NENE# signal. This feature allows 
for higher speeds when using static column or 
page-mode DRAMs and consecutive reads and 
writes access the row. The comparison ignores 
the low-order 12 + DPS bits. A value of zero is 
appropriate for one bank of 256K xX nm RAMs, 1 
for 1M X nm RAMS, etc. For interleaved memory, 
increase DPS by one for each power of interleav- 
ing—add one for 2-way, and two for 4-way, etc. 


When BL (Bus Lock) is set, external bus access- 
es are locked. The LOCK # signal is asserted the 
next bus cycle whose internal bus request is gen- 
erated after BL is set. It remains set on every 
subsequent bus cycle as long as BL remains set. 
The LOCK# signal is deasserted on the next 
load or store instruction after BL is cleared. Traps 
immediately clear BL. The lock and unlock 
instructions control the BL bit. The result of modi- 
fying BL with the st.c instruction is not defined. 


IT! (l-Cache, TLB Invalidate), when set in the val- 
ue that is loaded into dirbase, causes ail entries 
in the instruction cache and address-translation 
cache (TLB) to be invalidated. The ITI bit does 
not remain set in dirbase. IT] always appears as 
zero when reading dirbase. Section 2.5 discuss- 
es flushing the data cache before invalidating the 
TLB. 


When CS8 (Code Size 8-Bit) is set, instruction 
cache misses are processed as 8-bit bus cycles. 
When this bit is clear, instruction cache misses 
are processed as 64-bit bus cycles. This bit can 
not be set by software; hardware sets this bit at 
initialization time. It can be cleared by software 
(one time only) to allow the system to execute out 
of 64-bit memory after bootstrapping from 8-bit 
EPROM. A-nondelayed branch to code in 64-bit 
memory should directly follow the st.c (store con- 
trol register) instruction that clears CS8, in order 
to make the transition from 8-bit to 64-bit memory 
occur at the correct time. The branch instruction 
must be aligned on a 64-bit boundary. 


RB (Replacement Block) identifies the cache 
block to be replaced by cache replacement algo- 
rithms. The high-order bit of RB is ignored by the 
instruction and data caches. RB conditions the 
cache flush instruction flush, which is discussed 
in Section 8. Table 2.3 explains the values of RB. 


RC (Replacement Control) controls cache re- 


placement algorithms. Table 2.4 explains the sig- 


nificance of the values of RC. - 


DTB (Directory Table Base) contains the high-or- 
der 20 bits of the physical address of the page 
directory when address translation is enabled (i.e. 
ATE = 1). The low-order 12 bits of the address 
are zeros. 
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Figure 2.7. Floating-Point Status Register 


Table 2.3. Values of RB 


Selects the normal replacement 
algorithm where any block in the set 
may be replaced on cache misses in all 
caches. 


Instruction, data, and TLB cache 
misses replace the block selected by. 
RB. The instruction and data caches 
ignore the high-order bit of RB. This 
mode is used for instruction cache and 
TLB testing. 


Data cache misses replace the block 
selected by the low-order bit of RB. 
Instruction and TLB caches use 
random replacement. 


Disables data cache replacement. 
. Instruction and TLB caches use 
random replacement. 


2.2.7 FAULT INSTRUCTION REGISTER 


When a trap occurs, this register contains the ad- 
dress of the trapping instruction (not necessarily the 
instruction that created the conditions that required 
the trap). The fir is a read-only register. In single-in- 
struction mode, using a Id.c instruction to read the 
fir anytime except the first time after a trap. saves in 
idest the address of the Id.c instruction; in dual-in- 
struction mode, the address of its floating-point com- 
panion.(address of the Id.c — 4) is saved. 


2.2.8 FLOATING-POINT STATUS REGISTER 


The floating-point status register (fsr) contains the 
floating-point trap and rounding-mode status for the 
current process. Figure 2.7 shows its format. The fsr 
_is writable in user level. 


e If FZ (Flush Zero) is clear and nndeniow occurs, 
a result-exception trap is generated. When FZ is 
set and underflow occurs, the result is set to zero, 
and no trap due to underflow occurs. 


© If Tl (Trap Inexact) is clear, inexact results do not 
cause a trap. If Tl is set, inexact results cause a 
trap. The sticky inexact flag (SI) is set whenever 
an inexact result is produced, regardless of the 
setting of TI. : 


e RM (Rounding Mode) specifies one of the four 
rounding modes defined by the IEEE standard. 
Given a true result 6 that cannot be represented 
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Table 2.5. Values of RM 


| Rounding Mode Rounding Action 


Round to nearest or even 


Round down (toward — °°) 


Round up (toward + 00 


_ Closer to 6 of a or c; if equally 
close, select even number 
(the one whose least 


significant bit is zero). 
a 


Cc 


Chop (toward zero) 


by the target data type, the i860 XR microproces- 
sor determines the two representable numbers a 
and c that most closely bracket b.in value (a < b 
< c). The i860 XR microprocessor then rounds 
(changes) b to a or c according to the mode se- 
lected by RM as defined in Table 2.5. Rounding 
introduces an error in the result that is less than 
one least-significant bit. 


° The U-bit (Update Bit), if set in the value that is 


loaded into fsr by a st.c instruction, enables up- 
dating of the result-status bits (AE, AA, Al, AO, 
AU, MA, MI, MO, and MU) in the first-stage of the 
floating- point adder and multiplier pipelines. If this 
bit is clear, the result-status bits are unaffected 
by a st.c instruction; st.c ignores the correspond- 
ing bits in the value that is being loaded. A st.c 
always updates fsr bits 21..17 and 8..0 directly. 


The U-bit does not remain set; it always appeals 


as zero when read. | 


The FTE (Floating-Point Trap Enable) bit, if clear, 
disables all floating-point traps (invalid input oper- 
and, overflow, underflow, and inexact result). 

SI (Sticky Inexact) is set when the last stage re- 
sult of either the multiplier or adder is inexact (i.e. 


when either Al or MI is set). SI is “sticky” in the - 


- sense that it remains set-until reset by software. 
Al and MI, on the other hand, can by changed by 
. the subsequent floating-point instruction. 


SE (Source Exception) is set when one of the 
source operands of a floating-point operation is 
invalid; it is cleared when all the input operands 
are valid. Invalid input operands include denor- 
mals, infinities, and all NaNs (both quiet and sig- 
naling). 


When read from the fsr, the result-status bits MA, 
Mi, MO, and MU (Multiplier Add-One, Inexact, 
Overflow, and Underflow, respectively) describe 
the last stage result of the multiplier. 


When read from the fsr, the result-status bits AA, 
Al, AO, AU, and AE (Adder Add-One, Inexact, 
Overflow, Underflow, and Exponent, respectively) 
describe the last stage result of the adder. The 
high-order three bits of the 11-bit exponent of the 
adder result are stored in the AE field. 


The Adder Add One and Multiplier Add One bits 
indicate that the absolute value of the result frac- 


Smaller in magnitude of 2 or c. 


tion grew by one least-significant bit due to 
rounding. AA and MA are not Men se by the 
sign of the result. 


After a floating-point operation in a given unit (ad- 
der or multiplier), the result-status bits of that unit 
are undefined until the point at which result ex- 
ceptions are reported. 


When written to the fsr with the U-bit set, the 
result-status bits are placed into the first stage of 
the adder and multiplier pipelines. When the 
processor executes pipelined operations, it prop- 
agates the result-status bits of a particular unit 
(multiplier or adder) one stage for each pipelined 
floating-point operation for that unit. When they 

_ reach the last stage, they replace the normal re- 

_ gult-status bits in the fsr. When the U-bit is not 
set, result-status bits in the word being written to 
the fsr. are ignored. 


In a floating-point dual-operation instruction (e.g. 
add-and-multiply or subtract-and-multiply), both 
the multiplier and the adder may set exception 
bits. The result-status bits for a particular unit re- 

main set until the next operation that uses that 
unit. 


° RR (Result Register) specifies which floating- 
point register (f(O-f31) was the destination regis- 
ter when a result-exception trap occurs due to a 
scalar operation. | 


e LRP (Load Pipe Result Precision), IRP (Integer 
(Graphics) Pipe Result Precision), MRP (Multiplier 
Pipe Result Precision), and ARP (Adder Pipe Re- 
sult Precision) aid in restoring pipeline state after 
a trap or process switch. Each defines the preci- 
sion of the last stage result in the corresponding 
pipeline. One of these bits is set when the result 

- in the last stage of the corresponding pipeline is 
double precision; it is cleared if the result is single 
precision. These bits cannot be changed by soft- 
ware. | 


2.2.9 KR, KI, T, AND MERGE REGISTERS 
The KR, Ki, and T registers are special-purpose reg- 


isters used by the dual-operation floating-point 
instructions pfam, pfmam, pfsm, and pfmsm, 
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_ which initiate both an adder (A-unit) operation and a 


multiplier (M-unit) operation. The KR, KI, and T regis- 
_ters can store values from one dual-operation in- 
struction and supply them as inputs. to subsequent 
dual-operation instructions. (Refer to Figure 2.14.) | 


The MERGE register is used only by the graphics 
instructions. The purpose of the MERGE register is 
to accumulate (or merge) the results of multiple-ad- 


dition operations that use as operands the color-in- . 
tensity values from pixels or distance values froma __ 


Z-buffer. The accumulated results can. men be 
stored in one 64- bit operation. 


Two ‘Aauitiplexadalticn instructions and. an OR in- 
struction use the MERGE register. The addition in- 
structions are designed to add interpolation values 
~ to each color-intensity field in an array of pixels. or to 
each distance value in a Z-buffer. 2 


| _ Refer to the instruction descriptions in section 8 for 
more information about these registers. me 


2. 3 Addressing 


Memory is addressed in byte units with a Bacall vir- 


tual-address space of 232 bytes. Data and instruc- | 


tions can be ‘located anywhere in this address 
space. Address arithmetic is performed using 32-bit 
input values and produces 32-bit results. The low-or- 
der 32 bits of me result are used | in case of everlow. 


Normally, multibyte data values. are stored i in memo- 
ry in little endian format, i.e., with the least significant 
byte at the lowest memory address. As an option, 
the ordering can be dynamically selected by soft- 
ware in supervisor mode. The i860 XR microproces- 
sor also offers big endian mode, in which the most 
significant byte of a data item is at the lowest ad- 
dress. Figure 2.8 shows the difference between the 
two storage modes. Big endian and little endian data 
areas should not be mixed within a 64-bit data word. 
Illustrations of data ‘structures in this data sheet 
show data stored in little endian mode, i.e., the low- 
order byte is at the lowest memory address. 
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Code accesses are always done with little endian 


. ‘addressing. This implies that code will appear differ- 


ently than documented here when accessed as big 


endian data. Intel recommends that disassemblers 


running in a big endian system, convert instructions 
which have been read as data back to little endian 
form and present them in the format documented 
here. 


Page directories and page tables are also accessed 
in little endian mode; regardless of the value of the 
BE bit. 


Alignment ecuironede are as follows (any violation 
results in a data-access trap): © , : 


© 128-bit values are aligned on 16-byte boundaries | 
‘ when referenced in memory (i.e. the four least 
significant address bits must be zero). 


° 64-bit values are aligned on 8- -byte boundaries 
‘when referenced in memory (i.e. the three least 
significant address bits must. be zero). 22 


@ 32-bit values are aligned on 4-byte boundaries 


__ when referenced in memory (i.e. the two least 
_ Significant address bits must be zero). 


e 16-bit values are aligned on 2-byte boundaries 
when referenced in memory (i.e. the least signifi- | 
cant address bit must be zero). 


2.4 Virtual Addressing 


When address translation is enabled, ‘the i860 XR 
microprocessor maps instruction and data virtual ad- 
dresses into physical addresses before referencing 
memory. This address transformation is compatible 
with that of the Intel886™ microprocessor and im- 
plements the basic features needed for page-orient- 
ed virtual-memory ayetems: © and page- ee BIOlee- | 
tion. 


The address translation is optional. Address transla- 


~ tion is in effect only when the ATE bit of dirbase is 


set. This bit is typically set by the operating system 
during software initialization. The ATE bit must be 
set if the operating system is to implement page-ori- 
ented protection or page-oriented virtual memory. 
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eS ce, Oe) PAGE OFFSET 7 


Figure 2.9. Format of a Virtual Address. 


Address translation is disabled when the processor 


is reset. It is enabled when a store to dirbase sets © 


the ATE bit. It is disabled again when a store clears 
the ATE bit: 


2.4.1 PAGE FRAME 


A page frame is a 4-Kbyte unit of contiguous ad- 
dresses of physical main memory. Page frames be- 
gin on 4-Kbyte boundaries and are fixed in size. A 
page is the collection of data that occupies a page 
frame when that data is present in main memory. 


_ The data may also occupy some location in second- 


ary storage when there is not sufficient space in 
main memory. 


2.4.2 VIRTUAL ADDRESS 


A virtual address refers indirectly to a physical ad- 
_dress by specifying a page table, a page within that 


Tom [Pack | orrstr_ 


table, afd an offset within that page. Figure 2.9 
shows the format of a virtual address. 


Figure 2.10 shows how the i860 XR microprocessor 
converts the DIR, PAGE, and OFFSET fields of a 
virtual address into the physical address by consult- 
ing two levels of page tables. The addressing mech- 
anism uses the DIR field as an index into a page © 


directory, uses the PAGE field as an index into the 


page table determined by the page directory, and 
uses the OFFSET field to address a byte within the 
page determined by the page table. 


2.4.3 PAGE TABLES | 
A page table is simply an array of 32-bit page specifi- 
ers. A page table is itself a page, and therefore con- 


tains 4 Kbytes of memory or at most 1K 32-bit en- 
tries. 


PAGE FRAME 


PAGE DIRECTORY PAGE TABLE 


PG TBL ENTRY 


DIR ENTRY 
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Figure 2.10. Address Translation 
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Two levels of tables are used to address a page of 
memory. At the higher level is a page directory. The 
page directory addresses up to 1K page tables of 
the second level. A page table of the second level 
addresses up to 1K pages. All the tables addressed 
by one page directory, therefore, can address 1M 
pages (220). Because each page contains 4 Kbytes 
(212 bytes), the tables of one page directory can 
span the entire physical address space of the i860 
XR microprocessor (220 x 212 = 232), 


The physical address of the current page directory is 
stored in DTB field of the dirbase register. Memory 
management software has the option of using one 
page directory for all processes, one page directory 
for each process, or some combination of the two. 


2.4.4 PAGE-TABLE ENTRIES ~ 


Page-table entries (PTEs) in either level of page ta- 
bles have the same format. Figure 2.11 illustrates 
‘this format. a 


2.4.4.1 Page Frame Address 


The page frame address specifies the physical start- 
ing address of a page. Because pages are located 
on 4K boundaries, the low-order 12 bits are always 
zero. In a page directory, the page frame address is 
the address of a page table. In a second-level page 
table, the page frame address is the address of the 
page frame that contains the desired memory oper- 
and. - | 


2.4.4.2 Present Bit 


The P (present) bit indicates whether a page table 
entry can be used in address translation. P = 1 indi- 


PRESENT 
WRITABLE 

USER 
WRITE-THROUGH 
CACHE DISABLE 
ACCESSED 
DIRTY 
(RESERVED) 
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cates that the entry can be used. When P = 0 in 
either level of page tables, the entry is not valid for 
address translation, and the rest of the entry is avail- 
able for software use; none of the other bits in the 
entry is tested by the hardware. If P = 0 in either 
level of page tables when an attempt is made to use 
a page-table entry for address translation, the proc- 
essor signals either a data-access fault or an in- 
struction-access fault. In software systems that sup- 
port paged virtual memory, the trap handler can 
bring the required page into physical memory. 


Note that there is no P bit for the page directory 
itself. The page directory may be not-present while 
the associated process is suspended, but the oper- 
ating system must ensure that the page directory 
indicated by the dirbase image associated.with the , 


process is present in physical memory before the i 


process is dispatched. 


2.4.4.3 Writable and User Bits 


The W (writable) and U (user) bits are used for page- 

level protection, which the i860 XR microprocessor 

performs at the same time as address translation. 

The concept of privilege for pages is implemented 

by assigning each page to one of two levels: 

1. Supervisor level (U = 0)—for the operating sys- 
tem and other systems software and related data. 


2. User level (U = 1)—for applications procedures 
and data. 


The U bit of the psr indicates whether the i860 XR 
microprocessor is executing at user or supervisor 
level. The i860 XR microprocessor maintains the U 
bit of psr as follows: 


AVAILABLE FOR SYSTEMS PROGRAM USE 4 
31 “42 9 7 5 3 0 
PAGE FRAME ADDRESS 31..12 AVAIL x xo] aff lulle 


X indicates Intel reserved. Do not use. 


NOTE: 
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Figure 2.11. Format of a Page Table Entry 
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e The i860 XR microprocessor clears the psr U bit 
to indicate supervisor level when a trap occurs 
(including when the trap instruction causes the 
trap). The prior value of U is copied into PU. 


© The i860 XR microprocessor copies the psr PU 
_ bit into the U bit when an indirect branch is exe- 
cuted and one of the trap bits is set. If PU was 


one, the i860 XR microprocessor enters user — 


level. 


With the U bit of psr and the W and U bits of the 
page table entries, the i860 XR microprocessor im- 
plements the following protection rules: 


e When at user level, a read or write of a supervi- 
sor-level page causes a trap. 


e When at user level, a write to a page whose W bit 
is clear causes a trap. 


e When at user level, st.c to certain control regis- 
ters is ignored. 


_. When the i860 XR microprocessor is executing at 
_ supervisor level, all pages are addressable, but, 
when it is executing at user level, only pages that 
belong to the user- -level are addressable. 


When the i860 XR microprocessor is executing at 
supervisor level, all pages are readable. Whether a 
page is writable depends upon the write- ieieaa 
mode controlled by WP of epsr: 


WP = 0 
WP = 1 


All pages are writable. 


A write to a page whose W bit is 
- Clear causes a trap. 


When the i860 XR microprocessor is executing at 
user level, only pages that belong to user level and 
are marked writable are actually writable; pages that 
belong to supervisor level are neither readable nor 
writable from user level. 


2.4.4.4 Write-Through Bit 


The i860 XR microprocessor does not implement a 


write-through caching policy for the on-chip data 


cache; however, the WT (write-through) bit in the 
‘second-level page-table entry does determine inter-. 


nal caching policy. If WT is set in a PTE, on-chip 
caching of data from the corresponding page is in- 
hibited. The i860 XR CPU may place pages having 
WT = 1 into the instruction cache. Future imple- 


mentations of the i860 XR architecture may adhere 


to a write-through data caching policy. Therefore, 
they may cache pages having the WT bit of the PTE 
set. If WT is clear, the normal write-back policy is 
applied to data from the page in the on-chip caches. 
The WT bit of page directory entries is not refer- 
enced by the processor, but is reserved. 


The WT bit is independent of the CD bit; therefore, 
data may be placed in a second-level coherent 
cache, but kept out of the on-chip caches. 
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2.4.4.5 Cache Disable Bit 


lf the CD (cache disable) bit in the second-level 
page-table entry is set, data from the associated 
page is not placed in instruction or data caches. 
Clearing CD permits the cache hardware to place 
data from the associated page into caches. The CD 
bit of page directory entries is not referenced by the 
processor, but is reserved. 


To control external caches, the i860 XR microproc- 
essor outputs on its PTB pin either the CD or WT bit. 
The PBM bit of epsr determines which bit is output. 


2.4.4.6 Accessed and Dirty Bits 


The A (accessed) and D (dirty) bits provide data 
about page usage in both levels of the page tables. 


The i860 XR microprocessor sets the corresponding 
accessed bits in both levels of page tables before a 
read or write operation to a page. The processor 
tests the dirty bit in the second-level page table be- 
fore a write to an address covered by that page table 
entry, and, under certain conditions, causes traps. 
The trap handler then has the opportunity to main- 
tain appropriate values in the dirty bits. The dirty bit 
in directory entries is not tested by the i860 XR mi- 
croprocessor. The precise algorithm for using these 
bits is specified in Section 2.4.5. 


An operating system that supports paged virtual 
memory can use these bits to determine what pages 
to eliminate from physical memory when the de- 
mand for memory exceeds the physical memory 
available. The D and A bits in the PTE (page-table 
entry) are normally initialized to zero by the operat- 
ing system. ‘The processor sets the A bit when a 
page is accessed either by a read or write operation. 
When a data- or instruction-access fault occurs, the 
trap handler sets the D bit if an allowable write is 
being performed, then re-executes the instruction. | 


The operating system is responsible for coordinating 
its updates to the accessed and dirty bits with up- 
dates by the CPU and by other processors that may 
share the page tables. The i860 XR microprocessor 
automatically asserts the LOCK# signal while set- 
ting the A bit. If an A-bit of a PTE is found not set 
during a locked sequence (created by the lock in- 
struction), a trap will occur and the plocesser will not 
update the A-bit. 


2.4.4.7 Combining Protection of Both Levels of 
Page Tables 


For any one page, the protection attributes of its 
page directory entry may differ from those of its 
page table entry. The i860 XR microprocessor com- 
putes the effective protection attributes for a page 
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by examining the protection attributes in both the 
directory and the page table. Table 2.6 shows the 
effective protection provided by the possible combi- 
nations of protection attributes. 


2.4.5 ADDRESS TRANSLATION ALGORITHM 


The algorithm below defines the translation of each 
virtual address to a physical address. Let DIR, 
PAGE, and OFFSET be the fields of the virtual ad- 
dress; let PFA1 and PFA2 be the page frame ad- 
dress fields of the first and second level page tables 
respectively; DTB is the page directory table base 
address stored in the dirbase register. 


1. Read the PTE (page table entry) at the physical 
address formed by DTB:DIR:00. 


2. If Pin the PTE Is zero, generate a data- or instruc- 
tion-access fault. 

3. If W in the PTE ts zero, the operation Is a write, 
and either the U-bit of the PSR is set or WP = 1, 
generate a data or instruction access fault. 

4. |If the U-bit in the PTE is zero and the U-bit in the 
psr is set, generate a data or instruction access 
fault. 


5. lf Ain the PTE is zero, and if the TLB miss oc- 
curred while the bus was locked, generate a 
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data or instruction access fault. (The trap allows 
software to set A to one and restart the se- 
quence. This avoids ambiguity in determining 
what address corresponds to a locked sema- 
phore for external bus hardware use.) 


6. lf A in the PTE is zero, and if the TLB miss oc- 
curred while the bus was not locked, assert 
LOCK #. Re-fetch and check the PTE, set A, and 
store the PTE. Deassert LOCK # during the store. 

7. Locate the PTE at the physical address formed by 
PFA1:PAGE:00. 


8. Perform the P, W, U, and A checks as in steps 2 
through 6 with the second-level PTE. 

9. If D in the PTE is clear and the operation is a 
write, generate a data or instruction access fault. 


10. Form the physical address as PFA2:OFFSET. 


The i860 XR microprocessor looks only in external , | 


memory for Page Directories and Page Tables, in 
the translation process. The data cache is not 
searched. Therefore, any code which modifies Page 
Directories or Page Tables must keep them out of 
the cache. The tables should be kept in non-cache- 
able memory, or flushed from the cache. 


Table 2.6. Combining Directory and Page Protections | 


Combined Protection 
User Supervisor — 
Access Access 


Page Directory Page Table 


Entry 


oe eke) 


SN ee ee 


p--}-fens _ 
po--fes 


NOTES: 
N = No access allowed 
R = Read access only 


| Tl 


povefecepeneb ons 
Bu 
aU ee Dee |) a eee 


we=o | We=1 
R/W: ; = 
R/W ~ 


R/W 


Z 


>DV)VIDVD 


D 
—s 


Sm 
= 
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R/W = Both reads and writes allowed 
Xx = Don't care 
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The i860 XR microprocessor expects Page Directo- 
ries and Page Tables to be in little endian format. 
The operating system must maintain these tables in 
little endian format by either setting BE = O when 
manipulating the tables or by complementing bit 2 of 
the address when loading or storing entries. 


2.4.6 ADDRESS TRANSLATION FAULTS 


The address translation fault is one instance of the 
data-access fault. The instruction causing the fault 
can be re-executed upon returning from the trap 
handler. 


2.4.7 PAGE TRANSLATION CACHE 


_ For greatest efficiency in address translation, the 

i860 XR microprocessor stores the most recently 
used page-table data in an on-chip cache called the 
TLB (translation lookaside buffer). Only if the neces- 
sary paging information is not in the cache must 
both levels of page tables be referenced. 


2.5 Caching and Cache Flushing 


The i860 XR microprocessor has the ability to cache. 


instruction, data, and address-translation informa- 
tion in on-chip caches. Caching uses virtual-address 
tags. The effects of mapping two different virtual ad- 
dresses in the same address space to the same 
physical address are undefined. 


Instruction, data, and address-translation caching on 
the i860 XR microprocessor are not transparent. Be- 
‘cause the data cache uses a write-back protocol, 
writes do not immediately update memory, and 
writes to memory by other bus devices do not up- 
date the cache. Changes to page tables do not auto- 
matically update the TLB, and changes to instruc- 
tions do not automatically update the instruction 
cache. Under certain circumstances, such as I/O 
‘references, self-modifying code, page-table up- 
dates, or shared data in a multiprocessing system, it 


is necessary to bypass or to flush the caches. The ~ 


i860 XR microprocessor provides the following 
methods for doing this: 


e Bypassing Instruction and Data Caches. If 
. deasserted during cache-miss processing, the 
KEN# pin disables instruction and data caching 
of the referenced data. If the CD bit of the associ- 
ated second-level PTE is set, caching of data and 
instructions is disabled. The i860 XR CPU may 
place pages having WT = 1 into the instruction 
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cache. Future implementations of the i860 XR ar- 
chitecture may adhere to a write-through data 
cache policy. Thus, they may cache pages having 
the WT bit of the PTE set. The value of the CD bit 
or the WT bit is output on the PTB pin for use by 
external caches. . 


Invalidating Instruction and Address-Transla- 
tion Caches. Storing to the dirbase register with 
the ITI bit set invalidates the contents of the in- 
struction and address-translation caches. This bit 
should be set when modifying a page table, when 
modifying a page containing instructions, or when 
changing the DTB field of dirbase or the WP bit 
of the epsr. Note that in order to make the in- 
struction or address-translation caches consist- 
ent with the data cache, the data cache must be 
flushed before invalidating the other caches. 


NOTE: 

The mapping of the page containing the 
currently executing instruction and the 
next ‘six instructions should not be differ- 
ent in the new page tables when st.c dir- 
base changes DTB or activates ITI. The 
six instructions following the st.c should 
be nops and should lie in the same page 
as the st.c. 


Flushing the Data Cache. The data cache is 
flushed by a software routine using the flush in- 
struction. The data cache must be flushed prior to 
invalidating the instruction or address-translation _ 
caches (as controlled by the IT! bit of dirbase) or 
enabling or disabling address translation (via the 
ATE bit). The data cache does not need flushing 
if the program is modifying only the P, U, W, A, or 
D bits of a PTE (as long as the Page Frame Ad- 
dress is not changed and the PTE itself was not 
in the data cache.) The i860 XR CPU does not 
check these protection bits on cache line write- 
back. Thus, a trap handler can service a DAT for 
D-bit-zero by setting D = 1 and then ITI = 1. In 
the case of setting the P or A bits active, there is 
no need to invalidate or flush any caches be- 
cause the processor does not load entries into 
the TLB that have P = OorA = O. The i860 XR 
microprocessor searches only external memory 
for Page Directories and Page Tables in the 
translation process. The data cache is not 
searched. Therefore, Page Tables and Directo- 
ries should be kept in non-cacheable memory, or 
flushed from the cache by any code which ac- 
cesses them. 
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2.6 Instruction Set 


Table 2.7 shows the complete set of instructions 
grouped by function within processing unit. Refer to 
Section 8 for an algorithmic definition of each in- 
struction. 

The architecture of the i860 XR microprocessor 
uses parallelism to increase the rate at which opera- 
tions may be introduced into the unit. Parallelism in 
the i860 XR microprocessor is not transparent; rath- 
er, programmers have complete control over paral- 
lelism and therefore can achieve maximum perform- 
ance for a variety of computational problems. _ 


2.6.1 PIPELINED AND SCALAR OPERATIONS 


One type of parallelism used within the floating-point 
unit is ‘pipelining’. The pipelined architecture treats 
each operation as a series of more primitive opera- 
tions (called “stages’’) that can be executed in par- 
allel. Consider just the floating-point adder unit as an 
example. Let A represent the operation of the adder. 
Let the stages be represented by Ay, Ag, and Az. 
The stages are designed such that Aj+ 4 for one ad- 


der instruction can execute in parallel with Aj; for the. 


next adder instruction. Furthermore, each Aj can be 
- executed in just one clock. The pipelining within the 
- multiplier and graphics units can be described simi- 
"larly, except that the arenes of stages may be differ- 
ent. 


Figure 2.12 illustrates three-stage pipelining as 
found in the floating-point adder (also in the floating- 
point multiplier when single-precision input operands 
are employed). The columns of the figure represent 
the three stages of the pipeline. Each stage holds 
intermediate results and also (when introduced into 
first stage by software) holds status information per- 
taining to those results. The figure assumes that the 
instruction stream consists of a series of consecu- 
tive floating-point instructions, all of one type (i.e. all 
adder instructions or all single-precision multiplier in- 
structions). The instructions are represented as i, 
i+ 1, etc. The rows of the figure represent the states 
of the unit at successive clock cycles. Each time a 
pipelined operation is performed, the result of the 
last stage of the pipeline is stored in the destination 
register fdest, the pipeline is advanced one stage, 


and the input operands /src7 and fsrc2 are trans- 


ferred to the first stage of the pipeline. 


i860™ XR MICROPROCESSOR 


PRELIMINARY 


In the i860 XR microprocessor, the number of pipe- 
line stages ranges from one to three. A pipelined 
operation with a three-stage pipeline stores the re- 
sult of the third prior operation. A pipelined operation 
with a two-stage pipeline stores the result of the sec- 
ond prior operation. A pipelined operation with a 
one-stage pipeline stores the result of the prior oper- 
ation. | 


There are four floating-point pipelines: one for the 
multiplier, one for the adder, one for the graphics 
unit, and one for floating-point loads. The adder 
pipeline has three stages. The number of stages in 
the multiplier pipeline depends on the precision of 
the source operands in the pipeline. Single precision 
has three stages and double precision has two 
stages. The graphics unit has one stage for all preci- 
sions. The load pipeline has three stages for all pre- 
cisions. 


Changing the FZ (flush zero), RM (rounding mode), 
or RR (result register) bits of fsr while there are re- 
sults in either the multiplier or adder pipeline produc- 
es effects that are not defined. | 


2.6.1.1 Scalar Mode 


In addition to the pipelined execution mode, the i860 
XR microprocessor also can execute floating-point 
instructions in ‘‘scalar’’ mode. Most floating-point in- 
structions have both pipelined and scalar variants, 


distinguished by a bit in the instruction encoding. In 


scalar mode, the floating-point unit does not start a 
new operation until the previous floating-point oper- 
ation is completed. The scalar operation passes 
through all stages of its pipeline before a new opera- 
tion is introduced, and the result is stored automati- 
cally. Scalar mode is used when the next operation 
depends on results from the previous few floating- 
point operations (or when the compiler or program- 
mer does not want to deal with pipelining). 


2.6.1.2 Pipelining Status Information 


Result status information in the fsr eonacte of the 
AA, Al, AO, AU, and AE bits, in the case of the ad- 
der, and the MA, MI, MO, and MU bits, in the case of 
the multiplier. This information arrives at the fsr via 
the pipeline in one of two ways: | 
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Table 2.7. Instruction Set 


Load and Store Instructions - 


Load integer. 
Store integer 

F-P load 
Pipelined F-P load 
F-P store | 
Pixel store ) a 


RegistertoRegisterMoves 9 = | 
Transfer integer to F-P register 
_ Integer Arithmetic Instructions —_. 


{| Add unsigned a 
Add signed " | 
Subtract unsigned a, 
Subtract signed 


Shift Instructions 


Shift left 
Shift right ~ @: 
Shift right arithmetic 
Shift right double 


and . Logical AND | 
andh Logical AND high — 

andnot Logical AND NOT 

andnoth =| Logical AND NOT high. 

or —. | LogicalOR : 

orh | LogicalOR high 

xor Logical exclusive OR 

xorh Logical exclusive OR high 


Control-Transfer Instructions 


trap Software trap a 


intovr Software trap on integer overflow 
br Branch direct a 
bri Branch indirect 
bce Branch on CC 

| be.t Branch on CC taken 

| bne Branch on not CC 

| bne.t Branch on not CC taken 

_bte Branch if equal 

btne Branch if not equal 
bla Branch on LCC and add . 
call Subroutine call - 


calli Indirect subroutine call 


System Control Instructions oe 


flush Cache flush 
Idec - Load from control register 
st.c Store to control register 
lock Begin interlocked sequence 
unlock End interlocked sequence 


fadd.p F-P add | | 
pfadd.p Pipelined F-P add 
famov.r F-P adder move re se 
pfamov.r Pipelined F-P adder move — 
fsub.p F-P subtract 3 
pfsub.p Pipelined F-P subtract _ 
pfgt:p Pipelined F-P greater-than compare: 
pfeq.p — Pipelined F-P equal compare 
fix.p F-P to integer conversion 
pfix.p Pipelined F-P to integer conversion 
| ftrunc.p F-P to integer truncation : 
pftrunc.p 


Logical Instructions ae 


[winemonic [Description 


| F-P multiply it ‘98 , 
Pipelined F-P multiply | 
3-Stage pipelined F-P multiply 
F-P multiply low 7 
F-P reciprocal | 

F-P reciprocal square root 


F-P Adder Instructions 


Pipelined F-P to integer truncation 
Dual-Operation Instructions © ee 
Pipelined F-P add and multiply 
Pipelined F-P subtract and multiply _ 
Pipelined F-P multiply with add 
Pipelined F-P multiply with subtract | 


Long integer instructions - | . 


fisub.z _Long-integer subtract | 
pfisub.z Pipelined long-integer subtract. 
fiadd.z Long-integeradd . 
pfiadd.z Pipelined long-integer add 


Graphics Instructions | Cian, & | 


-fzchks 16-bit Z-buffer check _ 7 
pfzchks Pipelined 16-bit Z-buffer check 
fzchkl 32-bit Z-buffer check | 
pfzchkl Pipelined 32-bit Z-buffer check 
faddp Add with pixel merge = 
pfaddp Pipelined add with pixel merge 
faddz Add with Z merge ~— 
pfaddz Pipelined add with Z merge 
form — OR with MERGE register 


pform Pipelined OR with MERGE register 


Assembler Pseudo-Operations 
| Mnemonic | Description 


mov Integer register-register move 
fmov.r F-P reg-reg move | 
pfmov.r Pipelined F-P reg-reg move 
nop Core no-operation 

fnop - F-P no-operation 


pfle.p Pipelined F-P less-than or equal 
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- Figure 2.12. Pipelined Instruction Execution 


1. It is calculated by the last stage of the pipeline. 
This is the normal case. 


2. It is propagated from the first stage of the pipe- 
line. This method is used when restoring the state 
of the pipeline after a preemption. When a store 
instruction updates the fsr and the value of the 
U bit in the word being written into the fsr is set, 
the store updates the result status bits in the first 
stage of both the adder and multiplier pipelines. 
When software changes the result-status bits of 
the first stage of a particular unit (multiplier or ad- 
der), the updated result-status bits are propagat- 
ed one stage for each pipelined floating-point op- 
eration for that unit. In this case, each stage of the 
adder and multiplier pipelines holds its own copy 
-of the relevant bits of the fsr. When they reach 
the last stage, they override the normal result- 

_ Status bits computed from the last stage result. 
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At the next floating-point instruction (or, at certain 
core instructions), after the result reaches the last 
stage, the i860 XR microprocessor traps if any of the 
status bits of the fsr indicate exceptions. Note that 
the instruction that creates the exceptional condition 
is not the instruction at which the trap occurs. 


2.6.1.3 Precision in the Pipelines 


In pipelined mode, when a floating-point operation is 
initiated, the result of an earlier pipelined floating- 
point operation is returned. The result precision of 
the current instruction applies to the operation being 
initiated. The precision of the value stored in /dest is 
that which was specified by the instruction that initia- 
ted that operation. oe ate 


2-187 


i860™ XR MICROPROCESSOR 


PRELIMINARY 


31 ) 
d.FP-OP 


d. FP=OP or CORE=OP 


CORE-OP 7 


ENTER DUAL= 
INSTRUCTION MODE. 


INITIATE EXIT FROM 
CORE=OP FP—-OP DUAL=INSTRUCTION MODE. 
CORE-OP FP—OP : | . 
| oP 


CORE-OP. - 


LEAVE DUAL= 
INSTRUCTION MODE. 


TEMPORARY DUAL= 
INSTRUCTION MODE 
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~~’ Figure 2.13.. Dual- Instruction Mode Transitions 


If fdest is the same as fsrc7 or fsrc2, the value being 
stored in fdest is used as the input operand. In this 


case, the precision of fdest must Be the same as the 


source precision. 


The multiplier pipeline has two: stages when the 
source operand is double-precision and three stages 
when the precision of the source operand is single. 
This means that a pipelined multiplier operation 
stores the result of the second previous multiplier 
operation for double-precision inputs and third previ- 
ous for single-precision inputs (except when a 
ing precisions). 


2.6.1.4 Transition between pcalal and cpemee. 
c _ Operations * 


When a scalar aperdiion is ‘executed, it passes 
through all stages of the pipeline; therefore, any un- 
stored results in the affected ‘pipeline. are lost. To 
avoid losing information, the last pipelined, opera- 
tions before a scalar operation should be dummy 
pipelined operations that unload unstored results 
from the affected pipeline. 


After a scalar operation, the values of all pipeline 


-. stages of the affected unit (except the last) are un- 


defined. No spurious result-exception traps result 
when the undefined values are subsequently stored 
by pipelined operations; however, the values should 
not be referenced as source operands. : 


For best performance a scalar operation should not 
immediately precede a pipelined pecranon” whose 
fdest is nonzero. 


2. 6.2 DUAL-INSTRUCTION MODE 


Another form of parallelism results from the fact that 
the i860 XR microprocessor can execute both a 
floating-point and a core instruction simultaneously. 
Such parallel execution is called dual-instruction 
mode. When executing in dual-instruction mode, the 
instruction sequence consists of 64-bit aligned in- 
structions with a floating-point instruction in the low- 
er 32 bits and a core instruction in the upper 32 bits. 
Table 2.7 identifies which instructions are executed 
by the core unit and which by the floating-point unit. - 
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Programmers specify dual-instruction mode either 
by including in the mnemonic of a floating-point in- 
struction a d. prefix or by using the Assembler direc- 
tives .dual ....enddual. Both of the specifications 
cause the D-bit of floating-point instructions to be 
set. If the i860 XR microprocessor is executing in 
single-instruction mode and encounters a floating- 
point instruction with the D-bit set, one more 32-bit 
instruction is executed before dual-mode execution 
begins. lf the i860 XR microprocessor is executing in 
dual-instruction mode and a floating-point instruction 
is encountered with a clear D-bit, then one more pair 
of instructions is executed before resuming single-in- 
struction mode. Figure 2.13 illustrates two variations 
of this sequence of events: one for extended se- 
quences of dual-instructions and one for a single | in- 
struction pair. 


When a 64-bit dual- instruction pair sequentially fol- 
lows a delayed branch instruction in dual-instruction 
mode, both 32-bit instructions are executed. 


2.6.3 DUAL-OPERATION INSTRUCTIONS 


Special dual-operation floating-point instructions 
(add-and-multiply, subtract-and-multiply) use both 
the multiplier and adder units within the floating- 
point unit in parallel to efficiently execute such com- 
mon tasks as evaluating systems of linear equa- 
tions, performing the Fast Fourier Transform (FFT), 
and performing graphics transformations. _ 


The instructions pfam fsrc7, fsrc2, fdest (add and 
multiply), pfsm fsrc7, fsrc2, fdest (subtract and mul- 
tiply), pfmam fscr7, fsrc2, fdest (multiply and add), 
and pfmsm fsrc7, fsrc2, fdest (multiply and subtract) 
initiate both an adder operation and a multiplier op- 
eration. Six operands are required, but the instruc- 
tion format specifies only three operands; therefore, 


there are special provisions for specifying the oper- — 


ands. These special provisions consist of: ~ 


© Three special registers (KR, KI, and T), that can 
store values from one dual-operation instruction 
and supply them as inputs to subsequent dual- 
operation instructions. — oe 


1. The constant registers KR and KI can store the 
value of fsrc7 and subsequently supply that 
value to the multiplier pipeline in place of fsrc7. 


2. The transfer register T can store the last stage 
_ result of the multiplier pipeline and subse- 
quently supply that value to the adder pipeline 

in place of fsrc7. 


© A four-bit data-path control field in the opcode 
(DPC) that specifies the operands and loading of 
the special registers. 


1. Operand-1 of the multiplier can be KR, KI, or 
fsre?. 


2. Operand-2 of the multiplier can be fsrc2 or the 
last stage result of the adder pipeline. 
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3. Operand-1 of the adder can be fsrc7, the 
T-register, or the last stage result of the adder 
pipeline. 

4. Operand-2 of the adder can be fsrc2, the last 
stage result of the multiplier pipeline, or the 
last stage result of the adder pipeline. 


Figure 2.14 shows all the possible data paths sur- 
rounding the adder and multiplier. A DPC field in 
these instructions select different data paths. Table 


8.8 shows the various encodings of the DPC field. 


Refer to Dual Operation Instructions section in the 
i860 Microprocessor Programmer’s Reference Man- 
ual for pictorial description. 


SRC2 RDEST 


MULTIPLIER UNIT 


RESULT |. 


OP1 
. ADDER UNIT | 


RESULT 
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Figure 2.14. Dual-Operation Data Paths 


Note that the mnemonics pfam.p, pfsm.p, 
pfmam.p, and pfmsm.p are never used as such in 
the assembly language; these mnemonics are used 
here to designate classes of related instructions. 
Each value of DPC has a unique mnemonic associ- 
ated with it. 


2.7. Addressing Modes 


Data access is limited to load and store instructions. 
Memory addresses are computed from two fields of 
load and store instructions: isrc? and isrc2. 


1. isrc7 either contains the identifier of a 32-bit inte- 
ger register or contains an immediate 16-bit ad- 
dress offset. 


2. isrc2 always specifies a easiee 
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Table 2.8. Types of Traps 


—— Caused by 
PSR,EPSR 


Instruction Software traps (trap, intovr 
Fault Missing unlock Any 


Floating Floating-point source exception | Any M- or A-unit except fmlow 
Point © Floating-point result exception Any M- or A-unit except fmlow, pfgt, 
Fault overflow and pfeq. Reported on any F-P 
underflow | instruction plus pst, fst, and 
inexact result - sometimes fid, pfld, ixfr 


Address translation exception — Any | | 
— during instruction fetch | | 


Load/store address translation | Any load/ store 
exception 

Misaligned operand address 

Operand address matches 

db register 


| Interrupt. iN External interrupt 
No trap bits set Hardware RESET signal 


NOTES: 
*These cases can be dictpiquistied by examining the operand addresses. 
The IL bit of the epsr must be checked by the trap handler to tell if the bus is currently i ina 1 locked sequence. 


Data Access 
Fault 


Any load/store 
_-| Any ah store 


Because either /src7 or /src2 may be null (zero), a cute a special program known as a trap handler. 


variety of useful addressing modes result: Traps are divided into the types shown in Table 2.8. 
offset + register Useful for accessing fields within Interrupts and traps start execution in single instruc- 
= a record, where aeiet points tion mode at virtual address OxFFFFFFOO in supervi- 

to the beginning of the record. sor level (U = 0). 


Useful for accessing items in a , | 
stack frame, where register is 2.8.1 TRAP HANDLER INVOCATION | 


r3, the register used for pointin | 
to the Begining of ee ie This section applies to traps other than reset. When 


frame. a trap occurs, execution of the current instruction is 
aborted. The instruction is restartable. The proces- 
sor takes the following steps while transferring con- 


register + register Useful for two-dimensional ar- 
| trol to the trap handler: 


rays or for array access within 
the stack frame. 


' 1. Copies U (user mode) of the sr into PU (previous 

register ~ . Useful as the end result. of any US. P ( | P (Pp ” 

| arbitrary address calculation. : 

. ry _ 2. Copies IM (interrupt mode) into PIM (previous IM). 
offset : Absolute address into the first or | 

, last 32K of the logical address 3. Sets U to zero (Supervisor mode). 
| space. 7 | | 4. Sets IM to zero (interrupts disabled). 

In addition, the floating- -point load and store instruc- 5. If the processor is in dual instruction mode, it sets 


DIM; otherwise it clears DIM. — 
tions may select autoincrement addressing. In this pig eg i: re 
mode /src2 is replaced by the sum of /src7 and isrc2 _‘§. If the processor is in single-instruction mode and 


after performing the load or store. This mode makes _‘ the ‘next instruction will be executed in dual- 
stepping through arrays more efficient, because it instruction mode or if the processor is in dual-in- 


eliminates one address-calculation instruction. struction mode and the next instruction will be 
| | 3 a oar ~.. executed in single-instruction mode, DS | is set; 
otherwise, it is cleared. 


2.8 Traps and Interrupts 55 7. The appropriate trap type bits in psr are set (IT, 
IN, IAT, DAT, FT). Several bits may be set if the 


corresponding mae congo occur SHnulte 
neously. - 


Traps are caused by exceptional conditions detect- 
ed in programs or by external interrupts. Traps 
cause interruption of normal program flow to exe- 
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8. An address is placed in the fault instruction regis- 
ter (fir) to help locate the trapped instruction. In 
single-instruction mode, the address in fir is the 

~address of the trapped instruction itself. In dual-in- 
struction mode, the address in fir is that of the 
floating-point half of the dual instruction. If an in- 
struction or data access fault occurred, the asso- 
ciated core instruction is the high-order half of the 
dual instruction (fir + 4). In dual-instruction 
mode, when a data access fault occurs in the ab- 

~ sence of other trap conditions, the floating-point 
half of the dual instruction will already have been 
executed. 


The processor begins executing the trap handler 
by transferring execution to virtual address 
OxFFFFFFOO. The trap handler begins execution in 
single-instruction mode. The trap handler must ex- 
amine the trap-type bits in psr (IT, IN, IAT, DAT, FT) 
to determine the cause or causes of the trap. 


2.8.2 INSTRUCTION FAULT 


This fault is caused by any of the following condi- 
tions. In all cases the processor sets the IT bit be- 
fore entering the trap handler. 


1. By the trap instruction. When trap is executed in 
dual-instruction mode, the floating-point compan- 
ion of the trap instruction is not executed before 
the trap is taken. 


2. By the intovr instruction. The trap occurs only if 
OF in epsr is set when intovr is executed. The 
trap handler should clear OF before returning. 
When intovr causes a trap in dual-instruction 
mode, the floating-point companion of the intovr 

instruction is completely executed before the trap 
is taken. 

3. By violation of lock/unlock protocol, explained be- 
low. (Note that trap and intovr should not be 
used within a locked sequence; otherwise, it 
would be difficult to distinguish between this and 
the prior cases.) | 
The lock protocol requires the following sequence 
of activities: 


1. lock 


2. Any load or store instruction that misses the 
cache — 


3. unlock 


4. Any load or store instruction (regardless of- 


whether it misses the cache) 


There may be other instructions between any of 
these steps. The bus is locked after step 2; and re- 
mains locked until step 4. Step 4 must follow step 1 
by 30 instructions or less, otherwise the instruction 
trap occurs. In case of a trap, IL is also set. If the 


load or store instruction in step 2 hits the cache, the 


sequence is legal, but the bus is not locked. 
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2.8.3 FLOATING-POINT FAULT 
The floating-point fault is reported on floating-point 


_instructions, pst, fst, and sometimes fid, pfid, ixfr. 


The floating-point faults of the i860 XR microproces- 
sor support the floating-point exceptions defined by 
the IEEE standard as well as some other useful 
classes of exceptions. The i860 XR microprocessor 
divides these into two classes: source exceptions 
and result exceptions. The numerics library supplied 
by Intel provides the IEEE standard default handling 
for all these exceptions. 


2.8.3.1 Source Exception Faults | 


When used as inputs to the multiplier or adder, all 
exceptional operands, including infinities, denormal- 
ized numbers and NaNs, cause a floating-point fault 


and set SE in the fsr. Source exceptions are report- : mee 
ed on the instruction that initiates-the operation. For - 


pipelined operations, the pipeline is not advanced. 


The SE value is undefined for faults on fild, pfid, fst, 
pst, and ixfr instructions when in single-instruction 
mode or when in dual-instruction mode and the com- 
panion instruction is not a multiplier or adder opera- 
tion. 


2.8.3.2 Result Exception Faults 


The class of result exceptions includes any of the 
following conditions: 


© Overflow. The absolute value of the rounded 
true result would exceed the largest positive finite 
~ number in the destination format. 


© Underflow (when FZ is clear). The absolute val- 
ue of the rounded true result would be smaller 
than the smallest positive finite number in ne 
destination format. - 


© Inexact result (when T| is set). The result is not 
exactly representable in the destination format. 
For example, the fraction 14 cannot be precisely 
represented in binary form. This exception occurs 
frequently and indicates that some (generally ac- 
ceptable) accuracy has been lost. 


The point at which a result exception is reported de- 
pends upon whether Pipelined operations are being 
used: 


e Scalar (nonpipelined) operations. Result ex- 
ceptions are reported on the next floating- -point, 
fst.x, or pst.x (and sometimes fld, pfld, ixfr) in- 

_ struction after the scalar operation. When a trap 
occurs, the last stage of the affected unit con- 

_ tains the result of the scalar operation. 


° Pipelined operations. ‘Result exceptions are re- 
ported when the result is in the last stage and the 
next floating-point, fst.x or pst.x (and sometimes 
fld, pfld, ixfr) instruction is executed. When a 
trap occurs, the pipeline is not advanced, and the 
last stage results (that caused the trap) remain 
unchanged. . | 
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When no trap occurs (either because FTE is clear or 
because no exception occurred), the pipeline is ad- 
vanced normally by the new floating-point operation. 


The result-status bits of the affected unit are unde- 
fined until the point that result exceptions are report- 
ed. At this point, the last stage result-status bits (bits 
29..22 and 16..9 of the fsr) reflect the values in the 
last stages of both the adder and multiplier. For ex- 
ample, if the last stage result in the multiplier has 
overflowed and a pipelined floating- point pfadd is 
started, a trap occurs and MO | is set. | 


For scalar operations, the RR bits of fsr specify the 
register in which the result was stored. RR is updat- 
ed when the scalar instruction is initiated. The trap, 
however, occurs on a subsequent instruction. Pro- 
grammers must prevent intervening stores to fsr 
from modifying the RR bits. Prevention may take one 
of the following forms: ° 


e Before any store to fsr when a result exception 
_ may be pending, execute a dummy floating-point 
operation to trigger the result-exception trap. 


e Always read from fsr before storing to it, and 
mask updates so that the RR bits are not 
changed. 


For pipelined operations, RR is cleared and the re- 
sult is in the last stage of the pipeline of the appro- 
priate unit. The trap handler must flush the Pipeline 
saving the results and the Status bits. 


In either pipelined or scalar mode, the eas handler 
must then compute the trapping result. In either 
case, the result has the same fraction as the true 
result and has an exponent which is the low-order 
bits of the true result. The trap handler can inspect 
the result, compute the result appropriate for that 
instruction (a NaN or an infinity, for example), and 


_. store the correct result. The result is either stored in 


the register specified by RR (if nonzero) or (if RR = 
_Q) the trap handler must reload the pipeline with the 
saved results and status bits. 


Result exceptions may be reported for both the ad- 
der and multiplier units at the same time. In this 
case, the trap handler should fix up the last stage of 
both pipelines. | 


2.8.4 INSTRUCTION ACCESS FAULT 


This trap occurs during address translation for in- 
struction fetches in any of these cases: | 


e The address fetched is in a page whose P (pres- 


_ ent) bit in the page table is clear (not present). 
e The address fetched is in a supervisor mode 
page, but the processor is in user mode. 


© The address fetched is in a page whose PTE has 
A = 0, and the.access occurs during a locked 
sequence (i.e., between lock and unlock). 
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Note that several instructions are fetched at one 
time, either due to instruction prefetching or to in- 
struction caching. Therefore, a trap handler can 
change from supervisor to user mode and continue 
to execute instructions fetched from a supervisor 
page. An instruction access trap occurs only when 
the next group of instructions is fetched from a su- 
pervisor page (up to eight instructions later). If, in the 
meantime, the handler branches to a user page, no 
instruction access trap occurs. No protection viola- 


_ tion results, because the processor does not permit 


data accesses to supervisor pages while oe in 
user mode. | 


2.8.5 DATA ACCESS FAULT 


This trap results from an abnormal condition detect- 
ed during data operand fetch or store. Such an ex- 
ception can be due only to one of the following caus- 
es: | | _ ar 


¢ An attempt is being made to write to a page 
_ whose D (Dirty) bit is clear. 


eA memory operand is misaligned (is not located 
at an address that is a multiple of the length of 
the data). 


° The address stored in 1 the db register is equal to 
_one of the addresses spanned by the operand. 


e The operand is in a not-present page. 


@ Anattempt is being made from user level to write 
to a read-only page or to access a supervisor-lev- 
el page. 

e The operand was ina page whose PTE had A= 
0, and the access occurred during a locked se- 
quence. (i.e., between lock and unlock.) 


© Write protection (determined by epsr bit WP = a 


is violated in supervisor mode. 


2.8.6 INTERRUPT TRAP 


An interrupt is an event that is ‘signaled from an ex- 
ternal source. If the processor is executing with in- 
terrupts enabled (IM set in the psr), the processor 
sets the interrupt bit IN in the psr, and generates an 
interrupt trap. Vectored interrupts are implemented 
by interrupt controllers and software. 


2.8.7 RESET TRAP 


‘When the i860 XR microprocessor is reset, execu- 


tion begins in single-instruction mode at physical ad- 
dress OxFFFFFFOO. This is the same address as for 
other traps. The reset trap can be distinguished from 
other traps by the fact that no trap bits are set. The 
instruction cache is flushed. The bits DPS, BL, and 


ATE in dirbase are cleared. CS8 is initialized by the 


value at the INT pin at the end of reset. The read- 
only fields of the espr are set to identify the proces- 
sor, while the IL, WP, and PBM bits are cleared. The 
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bits U, IM, BR, and BW in psr are cleared, as are the 


trap bits FT, DAT, IAT, IN, and IT. All other bits of — 


psr and all other register contents are undefined. 


Refer to Table 2.9 for a summary of these initial set- 
tings. 
Table 2.9. Register and Cache Values after Reset 


Integer Registers | Undefined 
Undefined 


Floating-Point 
Registers 

U, IM, BR, BW, FT, DAT, IAT, IN, 

IT = 0; others are undefined 

IL, WP, PBM, BE = 0; 
Processor Type, Stepping 
Number, DCS are read 
only; others are undefined 

Undefined 

DPS, BL, ATE = 0; others 

are undefined 

Undefined 

Undefined 

Undefined 


Instruction Cache | Flushed 
Data Cache Undefined 
TLB Flushed 


The software must ensure that the data cache is 
flushed and control registers are properly initialized 
before performing operations that depend on the 
values of the cache or registers. The data cache has 

o “validity” bits, so memory accesses before the 
flush may result in false data cache hits. 


Reset code must initialize the floating-point pipeline 
state to zero with floating-point traps disabled to en- 
sure that no spurious floating-point traps are gener- 
ated. 


After a RESET the i860 XR microprocessor starts 
execution at supervisor level (U=0). Before branch- 
ing to the first user-level instruction, the RESET trap 
handler or subsequent initialization code has to set 
PU. and a trap bit so that an indirect branch instruc- 
tion will copy PU to U, thereby changing to user level. 


2.9 Debugging 


The i860 XR microprocessor supports debugging 
with both data and instruction breakpoints. The fea- 
tures of the i860 XR architecture that support debug- 
ging include: 


@ db (data breakpoint register) which permits speci- 
fication of a data addresses that the i860 XR mi- 
croprocessor will monitor. 


i860™ XR MICROPROCESSOR 


PRELIMINARY 


e BR (break read) and BW (break write) bits of the 
psr, which enable trapping of either reads or 
_writes (respectively) to the address in db. 


e DAT (data access trap) bit of the psr, which al- 
lows the trap handler to. determine when a data 
breakpoint was the cause of the trap. 


© trap instruction that can be used to set break- 
points in code. Any number of code breakpoints 
can be set. The values of the /src7 and /src2 
fields help identify which breakpoint has Oc- 
curred. 


© IT (instruction trap) bit of the psr, which allows 
the trap handler to determine when a trap. 
instruction was the cause of the trap. 


3.0 HARDWARE INTERFACE 


In the following description of hardware interface, 
the # symbol at the end of a signal name indicates 
that the active or asserted state occurs when the 
signal is at a low voltage. When no # is present after 
the signal name, the signal is asserted when at the 
high voltage level. 


3.1 Signal Description 


Table 3.1 identifies functional groupings of the pins, 


_lists every pin by its identifier, gives a brief descrip- 


tion of its function, and lists some of its characteris- 
tics. All output pins are tristate, except HLDA and 
BREQ. All inputs are synchronous, except HOLD 
and INT. 


3.1.1 CLOCK (CLK) 


The CLK input determines execution rate and timing 
of the i860 XR microprocessor. Timing of other sig- 
nals is specified relative to the rising edge of this 
signal. The i860 XR microprocessor can utilize a 
clock rate of 25 MHz, 33.3 MHz or 40 MHz. The 
internal operating frequency is the same as the ex- 
ternal clock. | 


3.1.2 SYSTEM RESET (RESET) 


Asserting RESET for at least 16 CLK periods causes. 
initialization of the i860 XR microprocessor. Refer to 
section 3.2 aI ANZANOM for more details related to 
RESET. 


3.1.3 BUS HOLD (HOLD) AND BUS HOLD © 
ACKNOWLEDGE (HLDA) 


These pins are used for i860 XR microprocessor bus 
arbitration. At some clock after the HOLD signal is 
asserted, the i860 XR microprocessor releases con- 
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Table 3.1. Pin Summary 


Execution Control Pins 


CLK CLock 


RESET ~ System reset 

HOLD Bus hold 

HLDA Bus hold acknowledge 
BREQ Bus request 

INT/CS8 Interrupt, code-size 


Bus interface Pins 


“A31-A3 


Address bus High Oo 
BE7#—-BEO# Byte Enables Low O 
D63-—D0 Data bus High I/O 
LOCK # Bus lock Low O 
W/R# Write/Read bus cycle High/Low O 
NENE # NExt NEar Low O 
NA# Next Address request Low | 
READY # Transfer Acknowledge Low ‘| 
ADS # ADdress Status Low O 


Cache Interface Pins 


Testability Pins 


BSCN 


~ SCAN Shift Scan Path 


System power 
System ground 


Vss 


A # after a pin name indicates that the signal is active when at the low voltage level. 


trol of the local bus and puts all bus interface out- 


puts (except BREQ and HLDA) into a floating state, 
then asserts HLDA—all during the same clock peri- 
od. It maintains this state until HOLD is deasserted. 
Instruction execution stops only if required instruc- 
tions or data cannot be read from the on-chip in- 
struction and data caches. 


The time required to acknowledge a hold request is 
one clock plus the number of clocks needed to finish 
any outstanding bus cycles. HOLD is recognized 
even while RESET or LOCK # is asserted. 


When leaving a bus hold, the i860 XR microproces- 
sor deactivates HLDA and, in the same clock period, 
initiates a pending bus cycle, if any. 


Hold is an asynchronous input. 


KEN # Cache ENable » Low 
PTB Page Table Bit High 


SHI | Boundary Scan Shift Input 
Boundary Scan Enable 


we pieeerve’ Configuration Pins 


CCi=CC0 = a 


_ Power and Ground Pins 


~ High 
, High 
High 


3.1.4 BUS REQUEST (BREQ) 


This signal is asserted when the i860 XR microproc- 
essor has a pending memory request, even when 
HLDA is asserted. This allows an external bus arbi- 
ter to implement an ‘‘on demand only” policy for 
granting the bus to the i860 XR microprocessor. 
BREQ is asserted the clock after the i860 XR micro- 
processor realizes an internal request for the bus. In 
normal operation, BREQ goes low. the clock after 
ADS # goes low for the final pending bus cycle. (Re- 
fer to Figure 4.10 for timing information.) During data 
or instuction cache fills, however, BREQ may be 
deasserted for one or more clocks, due to cache 
and TLB logic. 


3.1.5 INTERRUPT/CODE-SIZE (INT/CS8) 


This input allows interruption of the current instruc- 
tion stream. If interrupts are enabled (IM set. in psr) 
when INT is asserted, the i860 XR microprocessor 
fetches the next instruction from address 
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OxFFFFFFOO. To assure that an interrupt is recog- 
nized, INT should remain asserted until the software 
acknowledges the interrupt (by writing, for example, 
~ to a memory-mapped port of an interrupt controller). 
When the bus is not locked, the maximum time be- 
tween the assertion of INT and the execution of the 
first instruction of the trap handler is ten clocks, plus 
the time for four sets of four pipelined read cycles 
and two sets of four pipelined writes (instruction- 
and data-cache misses and write-back cycles to up- 
date memory), plus the time for twenty nonpipelined 
read cycles (six TLB misses, with eight refetches 
when the A-bit is zero), plus the time for eight non- 
pipelined writes (updates to the A-bit). 


If the bus is locked from a lock instruction, the INT 
pin is ignored and the INT bit of epsr is always zero. 
The lock instruction can only assert LOCK # for 30- 
33 instructions before trapping. 


If INT is asserted during the clock before the falling 
edge of RESET, the eight-bit code-size mode is se- 
lected. For more about this mode, refer to section 
3.2 “‘Initialization”’. 


INT is an asynchronous input. 


3.1.6 ADDRESS PINS (A31-A3) AND BYTE 
ENABLES (BE7 # -BE0#) 


The 29-bit address bus (A31-—A3) identifies address- 
es to a 64-bit location. Separate byte-enable signals 
(BE7 #-BE0#) identify which bytes should be ac- 
cessed within the 64-bit location. In all noncachea- 
ble read cycles (KEN# deasserted), the byte 
enables match the length and address of the re- 
quested data. Cacheable read cycles (KEN# assert- 
ed), however, result in four 64-bit memory cycles to 
fill.an entire 32-byte cache line. The BEn# pins acti- 
vated are those that represent the operand of the 
load instruction that caused the line fill, and these 
same BEn# pins remain activated for all four cycles 
of the line fill. All 64 bits must be returned for each 
cycle without regard for the BEn# signals. In all 
write cycles (noncacheable writes as well as cache 
line write-backs) the BEn# signals indicate the 
bytes that must be written. 


Instruction fetches (WIRE is low) are distinguished 
from data accesses by the unique combinations of 
BE7#-BE0# defined in Table 3.2. For an eight-bit 
code fetch in eight-bit code-size (CS8) mode, 
BE2#-BE0# are redefined to be A2-A0 of the ad- 


dress. In this case BE7#-BE3# form the code. 


shown in Table 3.2 that identifies an instruction 
fetch. The A2 in the table does not represent a phys- 
ical pin, just a conceptual internal address line value. 
The “x” under A2 for CS8 mode means “not applica- 
ble’, or “don’t care”. All other combinations of byte 
enables indicate data accesses. 
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The address and byte-enable pins are driven until 
either NA# or READY # is asserted. 


3.1.7 DATA PINS (D63-D0) 


The bus interface has 64 bidirectional data pins 
(D63-—D0) to transfer data in eight- to 64-bit quanti- 
ties. Pins D7-D0O transfer the least significant byte; 
pins D63-D56 transfer the most significant byte. 


In read bus cycles, all 64 bits of the data bus are 
latched, even in CS8-mode instruction fetches when 
only the low-order eight bits are used. 


In write bus cycles, the point at which data is driven 
onto the bus depends on the type of the preceding 


cycle. If there was no preceding cycle (i.e. the bus Siam 
was idle), data is driven with the address. If the pre- Fay} 
ceding cycle was a write, data is driven as soon as /iiuuam 
READY # is returned from the previous cycle. If the ™ 


preceding cycle was a read, data is driven one clock 
after READY # is returned from the previous cycle, 
thereby allowing time for the bus to be turned 
around. Data continues to be driven until READY # 
for the current cycle is returned. 


3.1.8 BUS LOCK (LOCK #) 


This signal is used to provide atomic (indivisible) 
read-modify-write sequences in multiprocessor sys- 
tems. A multiprocessor bus arbiter must permit only 
one processor a locked access to the address which 
is on the bus when LOCK # first activates. The sys- 
tem must maintain the lock of that location until 
LOCK# deactivates. | 


The i860 XR microprocessor coordinates the exter- 
nal LOCK# signal with the software-controlled BL 
bit of the dirbase register. Programmers do not 
have to be concerned about the fact that bus activity 
is not always synchronous with instruction execu- 
tion. LOCK # is asserted with ADS # for the address 
operand of the first load or store instruction execut- 
ed after the BL bit is set by the lock instruction. 
Pending bus cycles are locked according to the val- 
ue of the BL bit when the instruction was executed. 
Even if the BL bit is changed between the time that 
an instruction generates an internal bus request and 
the time that the cycle appears on the bus, the i860 
XR microprocessor still asserts LOCK # for that bus 
cycle. 7 . 


lf ADS# is active when LOCK# deactivates, then 
that request should complete before the hardware 
relinquishes the lock. If ADS # is not active, the lock- 
ing of the location can immediately end when 
LOCK# deactivates. Of course the simplest arbitra- 
tion hardware can just lock the entire bus against all 
other accesses during LOCK# assertion through 
RDY # of the cycle in which LOCK # goes inactive. 
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Table 3. 2. eenttying Instruction Fetches 


Code 
Fetch 


Normal 
(Non-CS8) 


Normal 
_ (Non-CS8) 


CS8 

Mode 
When the BL bit is deasserted with the unlock in- 
struction, LOCK# is deasserted with the next load 
or store but after any pending bus cycles. Between 
locked sequences, at least one cycle of no LOCK# 
is guaranteed by the behavior of the unlock instruc- 
tion. LOCK# deassertion may occur independently 


of ADS# for the case of a trap ora cache hit after 
unlock. 


The i860 XR microprocessor also asserts LOCK # 
during TLB miss processing for updates of the ac- 
cessed bit in page-table entries. The maximum time 
that LOCK# can be asserted in this case is five 
clocks plus the time required to perform a read-mod- 
ify-write sequence. Instruction fetches do not alter 
the LOCK# pin. | } 


‘Between lock and unlock instructions, the INT pin is 
ignored and the INT bit of epsr is zero when read by 
Id.c epsr. The time. that interrupts are disabled is 
limited by the lock Polen outlined in Section 2.8.2. 


3.1.9 WRITE/READ BUS CYCLE (W/R#) — 


This pin specifies whether a bus cycle is a read 


(LOW) or write (HIGH) cycle. It is. driven until either - 


NA#. or. READY # is asserted. 


| 3.1.10 NEXT NEAR (NENE#) 


This signal allows higher-speed reads and writes in 
the case of consecutive reads and writes that ac- 
cess static column or page-mode DRAMs. The i860 
XR microprocessor asserts NENE# when the cur- 
rent address is in the same DRAM page as the pre- 
‘vious bus cycle. The i860 XR microprocessor deter- 
mines the DRAM page size by inspecting the DPS 
field in the dirbase register. The page size can 
range from 29 to 216 64-bit words, supporting DRAM 
sizes from 256K x 1, 256K xX 4, and up. NENE# is 
never asserted on the next bus cycle after HLDA | is 
deasserted. : | 


3.1.11 NEXT ADDRESS REQUEST T (NA#) 


NA# makes address pipelining possible. The sys- 
tem asserts NA# for at least one clock to indicate 
that it is ready to accept the next address from the 
i860 XR microprocessor. NA# may be asserted be- 


. fore the current cycle ends. (if the system does not 


implement pipelining, NA# does not have to be acti- 
vated.) The i860 XR microprocessor samples NA# 
every clock, starting one clock after the prior activa- 
tion of ADS#. When NA# is active, the i860 XR 
microprocessor is free to drive address and bus-cy- 
cle definition for the next pending bus cycle. The 
i860 XR microprocessor remembers that NA# was 
asserted when no internal request is pending; there- 
fore, NA# can be deactivated after the next rising 
edge of the CLK signal. Up to three bus cycles can 
be outstanding simultaneously. : 


3.1.12 TRANSFER ACKNOWLEDGE (READY #) 


_ The system must assert the READY #. signal during 


read cycles when valid data is on the data pins and — 
during write cycles when the system has accepted 
data from the data pins. READY # must be asserted 
for at least one clock. Sampling of READY # begins 
in-the clock after an ADS# or in the gabe clock 
after a pier READY .#. | 


3.1.13 ADDRESS STATUS (ADS#) 


The i860 XR microprocessor asserts ADS# during 
the first clock of each bus cycle to identify the clock 
period during which it begins to assert outputs on 
the address bus: This signal is held active for one > 


clock. 


3.1.14 CACHE ENABLE (KEN#) 
The i860 XR microprocessor samples KEN# to de- 


termine whether the data being read for the current 


cache-miss cycle is to be cached. This pin is inter- 
nally NORed with the CD: and WT bits to control 
cacheability on a page my page basis a to Table 
3.3). 


If the address is one that is permitted be in the 


cache, KEN# must be continuously asserted during 
the sampling period starting from the second rising 
clock edge after ADS# is asserted, through the 
clock NA# or READY # is asserted. The entire 64 
bits of the data bus will be used for the read, regard- 
less of the state of the byte-enable pins. Three addi- 
tional 64-bit bus cycles will be generated to fill the 
rest of the 32-byte cache block. 
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If KEN # is found deasserted at any clock from the 
clock after ADS# through the clock of the first NA# 
or READY #, the data being read will not be cached 
and two scenarios can occur: 1) if the cycle is due to 
data-cache miss, no subsequent cache-fill cycles 
will be generated; 2) if the cycle is due to an instruc- 
tion-cache miss, additional cycle(s) will be generat- 
ed until the address reaches a 32-byte boundary. To 
avoid caching a line, external hardware must deas- 
sert KEN# during or before the first NA# or 
READY #. 


3.1.15 PAGE TABLE BIT (PTB) 


Depending on the setting of the PBM (page-table bit 
mode) bit of the epsr, the PTB reflects the value of 
either the CD (cache disable) bit or the WT (write 
through) bit of the page-table entry used for the cur- 
rent cycle. When paging is disabled, PTB remains 
inactive. 


Table 3.3. Cacheability based on 
KEN# and CD OR WT 


cD OR WT 


0 0 Cacheable access 
0 | 
1 0 

| 7 1 


Noncacheable access 
3.1.16 BOUNDARY SCAN SHIFT INPUT (SHI) 


Noncacheable page 
Noncacheable page 


This pin is used with the testability features. Refer to 
section 3.3. 
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3.1.17 BOUNDARY SCAN ENABLE (BSCN) 

This pin is used with the testability features. Refer to 
section 3.3. 

3.1.18 SHIFT SCAN PATH (SCAN) 

This pin is used with the testability features. Refer to 
section 3.3. , 

3.1.19 CONFIGURATION (CC1-CCO) 


These two pins are reserved by Intel. Strap both pins 
LOW. 


3.1.20 SYSTEM POWER (Vcc) AND GROUND 


(Vss) 


The i860 XR microprocessor has 48 pins for power 
and ground. Ail pins must be connected to the ap- 
propriate low-inductance power and ground signals 
in the system. 


3.2 Initialization 


Initialization of the i860 XR microprocessor is 
caused by assertion of the RESET signal for at least 
16 clocks. Table 3.4 shows the status of output pins 
during the time that RESET is asserted. Note that 
HOLD requests are honored during RESET and that 
the status of output pins depends on whether a 
HOLD request is being acknowledged. 
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Table 3.4. Output Pin Status during Reset 
Pin Value. | 


HOLD 
Not 
Acknowledged 


HIGH |. Tri-State OFF 
Tri-State OFF 
BREQ 


Pin Name HOLD 
Acknowledged 


ADS #, LOCK # 
W/R#,PTB 


HLDA | HIGH 
D63-D0 Tri-State OFF | Tri-State OFF 


BE7 #-BEO#, 


| Undefined 
NENE # | 


Tri-State OFF 


After a reset, the i860 XR microprocessor begins ex- 
ecuting at physical address OxFFFFFFOO. The pro- 
gram-visible state of the i860 XR microprocessor af- 
ter reset is detailed in section 2.8.7. = © 


Eight-bit code-size mode is selected when INT/CS8 
is asserted during the clock before the falling edge 
of RESET. While in eight-bit code-size mode, in- 
struction cache misses are byte reads (transferred 
on D7-DO of the data bus) instead of eight-byte 
reads. This allows the i860 XR microprocessor to be 
bootstrapped from an eight-bit EPROM. For these 
code reads, byte enables BE2#-—BEO# are rede- 
fined to be the low order three bits of the address, 
so that a complete byte address is available. These 
reads update the instruction cache if KEN# is as- 
serted (refer to section 3.1.14) and are not pipelined 
even if NA# is asserted. While in this mode, instruc- 
tions must reside in an eight-bit wide memory, while 
data must reside in a separate 64-bit wide memory. 
After the code has been loaded into 64-bit memory, 
initialization code can initiate 64-bit code fetches by 
clearing the CS8 bit of the dirbase register (refer to 
section 2). Once eight-bit code-size mode is dis- 
abled by software, it cannot be reenabled except by 
resetting the i860 XR microprocessor. 


3.3 Testability 


The i860 XR microprocessor has a boundary scan 
mode that may be used in component- or board-lev- 
el testing to test the signal traces leading to and 
from the i860 XR microprocessor. Boundary scan 
mode provides a simple serial interface that makes it 
possible to test all signal traces with only a few 
probes. Probes need be connected only to CLK, 
BSCN, SCAN, SHI, BREQ, RESET, and HOLD. 


The pins BSCN and SCAN control the boundary 
scan mode (refer to Table 3.5). When BSCN is as- 
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serted, the i860 XR microprocessor enters boundary 
scan mode on the next rising clock edge. Boundary 
scan mode can be activated even while RESET is 
active. When BSCN is deasserted while in boundary 
scan mode, the i860 XR microprocessor leaves 
boundary scan mode on the next rising clock edge. 
After leaving boundary scan mode, the internal state 
is undefined; therefore, RESET should be asserted. 


Table 3.5. Test Mode Selection 


| BSCN | SCAN Testability Mode 
LO EO: 


No testability mode selected 
(Reserved for Intel) 
Boundary scan mode, normal 
Boundary scan mode, shift 
SHI as input; BREQ as 
output 


For testing purposes, each signal pin has associated 
with it an internal latch. Table 3.6 indentifies these 
latches by name and classifies them as input, out- 
put, or control. The input and output latches carry 


- the name of the corresponding pins. 


' Table 3.6. Test Mode Latches 


Associated 
Control 
Latch 


SHI 
BSCN 
SCAN 
‘RESET 
DO-D63 
CC1-CCO 


READY # 
KEN # 
NA# 
INT/CS8 
HOLD 


BE7#-BE0O# 
BREQ 


Within boundary scan mode the i860 XR microproc- 
essor operates in one of two submodes: normal 
mode or shift mode, depending on the value of the 
SCAN input. A typical test sequence is... 
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1. Enter shift mode to assign values to the latches 
that correspond with the pins. 


2. Enter normal mode. In normal mode the i860 XR 
‘microprocessor transfers the latched values to 
the output pins and latches the values that are 
being driven onto the input pins. 


3. Reenter shift mode to read the new values of the 
input pins. 


3.3.1 NORMAL MODE — 


When SCAN is deasserted, the normal mode is se- 
lected. For each input pin (RESET, WOLD, 
INT/CS8, NA#, READY#, KEN#, SHI, BSCN, 
SCAN, CC1, and CCO), the corresponding latch is 
loaded with the value that is being driven onto the 
pin. 


The tristate output pins (A31—-A3, BE7#-BE0#, 
W/R#, NENE#, ADS#, LOCK#, and PTB) are en- 
abled by the control latches ADDRt (for A31-—A3), 
BEt, W/Rt, NENEt, ADSt, LOCKt, and PTBt. If a con- 
trol latch is set, the corresponding output latches 
drive their output pins; otherwise the pins are not 
driven. 


The I/O pins (D63-—D0) are enabled by the control 
latch DATAt, which is similar to the other control 
latches. In addition, wnen DATATt is not set, the data 
pins are treated as input pins and their values are 
latched. 


3.3.2 SHIFT MODE 


When SCAN is asserted, the shift mode is selected. 
In shift mode, the pins are organized into a boundary 
scan chain. The scan chain is configured as a shift 
register that is shifted on the rising edge of CLK. The 
SHI pin is connected to the input of one end of the 
boundary scan chain. The value of the most signifi- 
cant bit of the scan chain is output on the BREQ pin. 
To avoid glitches while the values are being shifted 
along the chain, the tester should assert both the 
RESET and HOLD pins. Then all tristate outputs are 
disabled. The order of the pins within the chain is 
shown in Figure 3.1. 


3 4 


A tester causes entry into this mode for one of two 

purposes: 

1. To assign values to output latches to be driven 
onto output pins upon subsequent entry into nor- 
mal mode. : 


2. To read the values of input pins previously latched 
in normal mode. 


4.0 BUS OPERATION 


A bus cycle begins when ADS# is activated and 
ends when READY # is sampled active. READY # is 
sampled one clock after assertion of ADS# and 
thereafter until it becomes active. New cycles can 
start as often as every other clock until three cycles 


are outstanding. A bus cycle is considered outstand- fm 
ing as long as READY # has not been asserted to & 


terminate that cycle. After READY # becomes. ac- § 
tive, it is not sampled again for the following (out- 
standing) cycle until the second clock after the one 
during which it became active. READY # is assumed 
to be inactive when it is not sampled. 


With regard to how a bus cycle is generated by the 
i860 XR microprocessor, there are two types of cy- 
cles: pipelined and nonpipelined. Both types of cy- 
cles can be either read or write cycles. A pipelined 
cycle is one that starts while one or two other bus 
cycles are outstanding. A nonpipelined cycle is one 
that starts when no other bus cycles are outstand- 


ing. 


4.1 Pipelining 


A m-n read or write cycle is a cycle with a total cycle 
time of m clocks and a cycle-to-cycle time of n 
clocks (m = n). Total cycle time extends from the 
clock in which ADS# is activated to the clock in 
which READY # becomes active, whereas cycle-to- 
cycle time extends from the time that READY # is 
sampled active for the previous cycle to the time 
that it is sampled active again for the current cycle. 
When m = n, anonpipelined cycle is implied; m > n 
implies a pipelined cycle. 


5 


— SCAN —> RESET — DATAL —> ee ay 


72 100 | 101 _ 102 103 104 : 
A31. a2 AB ADDRt NENEt — NENE# PTBt — 


107 109 110 111 112 113 
- W/R# ADS # HLDA | LOCKt — LOCK# READY# —> 


116 © 118 119 _ 126 127 
—  INT/CS8 BEt BE7# sits — BEO# BREQ — 


Figure 3.1. Order of Boundary Scan Chain — 
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Pipelining may occur for the next bus cycle any time 
the current bus cycle requires more than two clock 
periods to finish (m > 2). If a bus request is pending, 
the next cycle will be initiated when NA# is sampled 
active, even if the current cycle has not terminated. 
In this case, pipelining occurs. NA# is not recog- 
nized unitl after ADS # has become inactive. 


To allow high transfer rates in large memory sys- 
tems, two-level pipelining is supported (i.e., there 
may be up to three cycles in progress at one time). 
_ Pipelining enables a new word of data to be trans- 


ferred every two clocks, even though the total cycle 


time may be up to six clocks. 


4.2 Bus State Machine 


The operation of the bus is described in terms of a 
bus state machine using a state transition diagram. 
Figure 4.1 illustrates the i860 XR microprocessor 
bus state machine. A bus cycle is composed of two 
or more states. Each bus state lasts for one CLK 
period. 


The i860 XR microprocessor supports up to two lev- 
els of address pipelining. Once it has started the first 
bus cycle, it can generate up to two more cycles as 
long as READY # remains inactive. To start a new 
bus cycle while other cycles are still outstanding, 
‘NA# must be active for at least one clock cycle 
starting with the clock after the previous ADS#. 
NA# is latched internally. 


States T; and Tj, forj = {1,2,3} and k = {1,2}, are 
used to describe the state of the i860 XR microproc- 
essor Bus State Machine. Index j indicates the num- 
ber of outstanding bus cycles while index k distin- 
guishes the intermediate states for the j-th outstand- 
ing cycle. Therefore there can be up to three out- 
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standing cycles, and there are two possible interme- 
diate states for each level of pipelining. Tj; is the 
next state after Tj, as long as j cycles are outstand- | 
ing. Tj2 is entered when NA# is active but the i860 
XR microprocessor is not ready to start a new cycle. 


Five conditions have to be met to start a new cycle 
while one or more cycles are already pending: 

1. READY # inactive | 

2. NA# having been active 

3. An internal request pending (BREQ active) 

4. HOLD not active 

5. Fewer than three cycles outstanding 


Note that BREQ is asserted on the clock after the 
i860 XR microprocessor realizes an internal request 
for the bus. 


Upon hardware RESET, the bus control logic enters 
the idle state T; and awaits an internal request for a 
bus cycle. If a bus cycle is requested while there is 
no hold request from the system, a bus cycle begins, 
advancing to state T;. On the next cycle, the state 


~ machine automatically advances to state Ty. If 


READY # is active in state T44, the bus control logic 
returns either to Tj, if no new cycle is started, or to 
Ty, if a new cycle request is pending internally. In 
fact, if an internal bus request is pending each time 
READY # is active,.the state machine continues to- 
cycle between M1 and Tj. 


However, if READY # is not active but the next ad- | 
dress request is pending (as indicated by an active 
NA#), the state machine advances either to state 
To (if an internal bus request is pending, signifying 
that two bus cycles are now outstanding), or to state 
T492 (if no bus internal request is pending, signifying 
NA# has been found active). Transitions from state 
T42 are similar to those from T14. | 
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READY# DEASSERTED- 
(NO REQUEST + 
HOLD ASSERTED) 


READY# DEASSERTED- 
REQUEST PENDING: 
READY# DEASSERTED- HOLD DEASSERTED 
(NO REQUEST + 
HOLD ASSERTED) Y READY# DEASSERTED- 
NA# ASSERTED- 
(NO REQUEST + 
HOLD ASSERTED 
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READY# DEASSERTED- 
NA# DEASSERTED 


READY# DEASSERTED 


- + READY# DEASSERTED-. 
NA# ASSERTED 


READY# DEASSERTED 
READY# ASSERTED 
NA# ASSERTED- 
REQUEST PENDING 

HOLD DEASSERTED © 


READY# DEASSERTED- 
NA# ASSERTED: 
REQUEST PENDING: 
HOLD DEASSERTED 


READY # ASSERTED 


READY# DEASSERTED 
READY# ASSERTED 
NA# ASSERTED- 
REQUEST PENDING 
HOLD DEASSERTED 


> READY# DEASSERTED: 
NA# ASSERTED- 
REQUEST PENDING: 


HOLD DEASSERTED 


READY# DEASSERTED- 
NA# DEASSERTED 


READY# ASSERTED 
NO REQUEST- 
HOLD DEASSERTED 


"ALWAYS" 
READY# ASSERTED: 
REQUEST PENDING 
HOLD DEASSERTED 


HOLD DEASSERTED- 
NO REQUEST 


. REQUEST PENDING: 
HOLD DEASSERTED 


HOLD ASSERTED 
HOLD DEASSERTED- 
NO REQUEST 


HOLD ASSERTED 


“REQUEST 


WR READY# ASSERTED 


NOTES: | nal uae 

READY# Once READY # has been sampled active, it is 
not sampled again until two clocks later 

Not sampled during ADS# active clock 
Active in T;, To and T3 . 

Active in Ty 

HOLD in this figure is the internally synchro- 
nized version of the external signal HOLD 
Internal Bus Request Pending (BREQ assert- 
ed) 


NA# 

ADS# 
HLDA 
HOLD 
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Figure 4.1. Bus State Machine 


If two bus cycles are already outstanding (as indicat- 
ed by To, fork = {1,2}) and NA# is latched active 
but READY # is not active, one more bus request 
causes entry into state T3. Transitions from this 
state are similar to those from To. | | 


In general, if there is an internal bus request each 
time both READY # and NA# are active, the state 


machine continues to oscillate between Tj; and Jj, 
forj = {2,3}. 


When NA # is sampled active while there is a pend- 
ing bus request, ADS # is activated in the next clock 
period (provided no more than two cycles are al- 
ready outstanding). 7 
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Internal pending bus requests start new bus cycles 
‘only if no HOLD request has been recognized. Ty is 
entered from the idle state T;, T;4, and Tyo. HLDA is 
active in this state. There is a one clock delay to 
synchronize the HOLD input when the signal meets 
the respective minimum setup and hold time require- 
ments. The state machine uses the synchronized 
HOLD to move from state to state. . 


43 Bus Cycles 


Figures 4.2 through 4.10 illustrate combinations of 


‘bus cycles. 


CYCLE 1 


NON=PIPELINED 
READ 
_ (2-2) - 


Ty T44 


i860™ XR MICROPROCESSOR 


NON—PIPELINED 


PRELIMINARY 


_ 4.3.1 NONPIPELINED READ CYCLES 


A read cycle begins with the clock in which ADS # is 
asserted. The i860 XR microprocessor begins driv- 
ing the address during this clock. It samples 
READY # for active state every clock after the first. 
clock. A minimum of two clocks is required per cycle. 
Data is latched when READY # is found active when 
sampled at the end of a clock period. Figure 4.2 il- 
lustrates nonpipelined read cycles with zero wait 
states. | 


CYCLE 2 CYCLE 3 


NON=PIPELINED - 
READ READ 
(2-2) (2-2). 


Ty 144 Ty 144 


“+ MIT OTOL. 
ee Dis al Rs ll ba 


BEng. NENES, OK KO 
me OTT TT TT | 
. READY# ZZ | |. 
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Figure 4.2. Fastest Read Cycles 
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CYCLE 1 


NON=PIPELINED 


WRITE 
(2-2) 


i860™ XR MICROPROCESSOR. 


NON=PIPELINED 


PRELIMINARY 


CYCLE 2 CYCLE 3 


NON=PIPELINED 
WRITE WRITE 
(2—2) (2=2) 


PPA Nl a vl as 


vom eet: [XXX TXRXT —LXKT—— 
woe (ZT TADS TNT 


LHD. TDS THN. 
i 


AANA \AAAA/ >I AAA/ 
/\ 9,9, OX | XXXXK OXY x 


Figure 4.3. Fastest Write Cycles . 


4.3.2 NONPIPELINED WRITE CYCLES 


The ADS# and READY # activity for write cycles 
follows the same logic as that for read cycles, as 
Figure 4.3 illustrates for back-to-back, nonpipelined 
write cycles with zero wait-states. 


The fastest write cycle takes only two clocks to com- 
plete. However, when a read cycle immediately pre- 
cedes a write cycle, the write cycle must contain a 
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wait state, as illustrated in Figure 4.4. Because the 
device being read might still be driving the data bus 
during the first clock of the write cycle, there is a 
potential for bus contention. To help avoid such con- 
tention, the i860 XR microprocessor does not drive 
the data bus until the second clock of the write cy- 


~ cle. The wait state is required to provide the addi- 


tional time necessary to terminate the write cycle. In 
other read-write Combinations, the i860 XR micro- 
processor does not require a wait state. 
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yd. CYCLE 1. {| CYCLE 2 CYCLE 3 


Se 


en XK XT 
[ae LS LS aR a I 
NM LTD) LALLY PD TAL) 


sane aaa 


apo Oat © 0.0.0 GD ote 


Figur e 4.4. Fa stest Read/Write Cycles 


CYCLE T CYCLE 2° i CYCLE 3° cycle 4 


_ Retake 


w/e 


ene neve: TRXOXT XXXL XXX XK XXX XT | 

~ ZS DN DN. 
ONO LS che NS 
poste. paedaewdeacetaceted |e 


3-00 |) ale Ol D tate 0.0.4, 0, GE, 0,0,0; 0, 


Figure 4.5. Pipelined Read Followed by Pipelined Write 


2.904 


CYCLE 1 


NON-=PIPELINED 
WRITE 
(S-5) 


TH 


A31—-A3, W/R#, 


i860™ XR MICROPROCESSOR 


PRELIMINARY 


CYCLE 2 CYCLE 3 CYCLE 4 


PIPELINED PIPELINED PIPELINED 
WRITE READ | READ 
(S-2) (5-2) (5-2) 


To Ts T24 Ts 


pa eed ed AALS 


sn ed, TNR ERR KT TK TK TK 


a» ZZzigDs | orion rs |r || 


Mm ON 


READY# 


ATs TADS. TADS 


XK XX aan sae an oes 
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Figure 4.6. Pipelined Write Followed by Pipelined Read 


4.3.3 PIPELINED READ AND WRITE CYCLES 


Figures 4.5 and 4.6 illustrate combinations of non- 
pipelined and pipelined read and write cycles. The 
_ following description applies to both diagrams. While 
Cycle 1 is still in progress, two new cycles are initiat- 
ed. By the time READY # first becomes active, the 
state machine has moved through states Ty, T41, 
To, To4, and T3. Cycles 3 and 4 show how activating 
READY # terminates the corresponding outstanding 
cycle, and yet activating NA# while there is an inter- 
nal request pending adds a new outstanding cycle. 


In Figure 4.5, Cycle 3 is a write cycle following a read 
cycle; therefore, one wait state must be inserted. 
The i860 XR microprocessor does not drive the data 
bus until one clock after the read data is returned 
from the preceding read cycle. During Cycles 3 and 
4, the state machine oscillates between states T3 


and T31 maintaining full bus capacity (two levels of 
pipelining; three outstanding cycles). Cycles 2, 3, 
and 4 in Figure 4.6 are 5-2 cycles; i.e. each requires 
a total cycle time of five clocks while the throughput 
rate is one cycle every two clocks. 


Figure 4.7 illustrates in a more general manner how 
the NA# signal controls pipelining. Cycle 1 is a 2-2 
cycle, the fastest possible. The next cycle cannot be 
started any earlier; therefore, there is no need to 
activate NA# to start the next cycle early. Cycle 2, a 


.3-3 read, is different. Cycle 3 can be started during 


the third state (a wait state) of Cycle 2, and NA# is 
asserted to accomplish this. 


NA# is not activated following the ADS# clock of 
Cycle 3, thereby allowing Cycle 3 to terminate be- 
fore the start of Cycle 4. As a result, Cycle 4 isa 
nonpipelined cycle. 
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| Sian 1 CYCLE 2 CYCLE 3 CYCLE 4 


PU 
_eigbas 


Ends NENED RTT DRT TTT XR 
w: WRT WN or a 
“I THT W\ LZ DD. TTT 

=< == 2 | 


Figure 4.7. Pipelining Driven by NA# 
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Figure 4.8. NA# Active with No Internal Bus Request 
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CYCLE 1 


NON=PIPELINED 
READ 
(2-2) 


Ty 44 
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CYCLE 3 
NON=PIPELINED 


WRITE WRITE 
(3-3) ; (2-2) 


CYCLE 2 


Ty T44 


ix 
PPP Nc dl I es al 


Beng, WENER | XXL XAT XX 
C3 
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===)“ LY 
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Figure 4.9. Locked Cycles 


When there is ‘no internal bus request, activating 
NA# does not start a new cycle; the i860 XR micro- 
processor, however, remembers that NA# has been 
activated. Figure 4.8 illustrates the situation where 
NA # is active but no internal bus request is pending. 
NA# is activated when two cycles are outstanding. 
Because there is no internal request pending until 
after one idle state, no new bus cycle is started dur- 
ing that period. : 


4.3.4 LOCKED CYCLES 


The LOCK # signal is asserted when the current bus 
cycle is to be locked with the next bus cycle. Asser- 
tion of LOCK# may be initiated by a program’s set- 
ting the BL bit of the dirbase register using the lock 
instruction (refer to section 2) or by the i860 XR mi- 
croprocessor itself during page table updates. 


In Figure 4.9, the first read cycle is to be locked with 
the following write cycle. If there were idle states 
between the cycles, the LOCK# signal would re- 
main asserted. This is the case for a read/modify/ 
write operation. Cycle 3 is not locked because 
LOCK # is no longer asserted when Cycle 2 starts. 


4.3.5 HOLD AND BREQ ARBITRATION CYCLES 


The HOLD, HLDA, and BREQ signals permit bus ar- 
bitration between the i860 XR micropioressct and 
another bus master. 


See Figure 4.10. When HOLD is asserted, the i860 
XR microprocessor does not relinquish control of 
the bus until all outstanding cycles are completed. If 
HOLD were asserted one clock earlier, the last i860 
XR microprocessor bus cycle before HLDA would 
not be started. 


HOLD is sampled at the end of the clock in which it 
is activated. Recommended setup and hold times 
must be met to guarantee sampling one clock after 
external HOLD activation. When HOLD is sampled 
active, a one clock delay for internal synchronization 
follows. Likewise when HOLD is deasserted, there is 
a one-clock delay for internal synchronization before 


‘HLDA is deasserted. The outputs (except HLDA and 
| a float when HLDA is asserted. 
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ihe: 
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D63-D0 
HOLD 
HLDA 


BREQ 
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Figure 4.10. HOLD, HLDA, and BREQ 


If, dina a HOLD cycle, an internal bus request is 


generated, BREQ is activated even though HLDA is 
asserted. It remains active at least until the clock 
after ADS # is activated for the requested cycle. 


4.4 Bus States During RESET 


Figure 4.11 shows how INT/CS8 is sampled during 
the clock period just before the falling. edge of RE- 


SET. If INT/CS8 is sampled active, the i860 XR mi- 
croprocessor enters CS8 mode. No inputs (except 


| 10K HOLD and INT/CS8) are Sanne during RESET. 


Note that, because HOLD is recognized even while 

RESET is active, the HLDA output signal may also 

become active during RESET. Refer to Table 3.4 
“Output Pin Status a Reset”. 


| = 16 CLKs | 


INT/cs8 


OTHER 
INPUTS 


XXXXKXEKY 


KXXKKXKKK 


WAVY VW 
MAXX IX | 
a ee 


Neiad 
at = 
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Figure 4.11. Reset Activities 
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5.0 MECHANICAL DATA 


‘Figures 5.1 and 5.2 show the locations of pins; Tables 5.1 and 5.2 help to locate pin identifiers. 


() 


Voc 


() 


Vss 


() 
Vec 


() 


Vss 
() 
Vec 
() 
W/R# 


() 
ADS# 


() 
| Lock# 


() 
INT/CS8 


() 
BES# 


() 
BE3¥ 


() 
SHI 


() 
RESET 


() 
Vss 
() 
Vec 
) 
Vss 


() 


() 
Vss 


() 
Vec 
() 
Vss 
() 


Vec 


() 
A4 


() 
NENE# 


() 
HLDA 


() 
KEN# 


() 
NA# 


() 
BE7# 


() 
BE2# 


() 
BEI# 


() 
SCAN 


() 


DO 


() 


Vss 
() 
Voc 
() 


Vss 


Figure 5.1. Pin Configuration—View from Top Side 
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Voc’ 
Vss 
ADS# 
BES#. . 
Vec 
Voc 


Oo OO O 
HLDA 
O O O 


Ste 
Of Of OF O§ 
Se 


LOCK# 
O | 
NA#  INT/CSB 


Vss 
Voc 
Vss 
Vss 


O£ OF 02 08 


O O 
Oo oO O 
BE7# 

D2 
Vee 


Of O2 O2 OF 


MEGS Se. 

Vss 

BREQ 
READY# KEN# 
HOLD 

BE64 


D3 
Vss 


OR 


ss 
AB 


A12 
A10 

D5 
Vec 


A17 
A13 
D7 
D6 


A19 
A15 
D11 
D10 


A21 
_ A118 
D13 
D012 


A25_ 

24 
p21 
D18 


A29 
A27 
023 
020 


A3t 
A28 
D27 
p24 


Yee 
. CO 
D29 
p26 


_ Figure 5.2. Pin Configuration—View from Pin Side 
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Vee 
D31 
ss 
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~Vss - 
‘p32 
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Vss 
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O80 O O 
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Table 5.1. Pin Cross Reference ad Location 


2-211 


intel. — i860™ XR MICROPROCESSOR PRELIMINARY 


Table 5.2. Pin Cross Reference by Pin Name 
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Table 5.3. Ceramic PGA Package Dimension Symbols 


Letter or Description of Dimensions 
Symbol 


Distance from seating plane to highest point of body 
Distance from base plane to highest point of body 


: 
3 

8d Diameterofteminalleadpin 
> Largest vera package dimensionofiengh 
: 


Distance from seating plane to end of lead moe 
Other body dimension, outer lead center to edge of body 7 
NOTES: 


1. Controlling dimension: millimeter. 

2. Dimension ‘‘e,” (“e”’) is non-cumulative. 

3. Seating plane (standoff) is defined by P.C. board hole size: 0.0415-—0.0430 inch. 
4. Dimensions “B”, “B,” and “C” are nominal. 

5. Details of Pin 1 identifier are optional. 
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SEATING 


SEATING 
PLANE 


@B (ALL PINS) 


{ 


SWAGGED 
PIN 
DETAIL 


OOWOOOOOHOHOOHDOHOOO 


2.2 
152 REF. 


45° CHAMFER 
(INDEX CORNER) 
240296-30 


F 
| Millimeters— | 

Symbol 

8 renee ae 
| A | 356 | 457 | | 0.140 | 0.180 
| Ay | 064 | 1.14 | SOLIDLID | 0.025 | 0.045. | SOLIDLID | 
| Ae | 279 | 3.56 | SOLIDLID | 0.110 | 0.140 | SOLIDLID | 
| Ag [| 4t4 | 140 | | 0.085 | 0.085 | 

|B | 043 | ost | | 0017 | 0.020 _ 

| bd | 4407 | 4463 | | 1.795 | 1.765 | 
{by [4051 | 40.77 | | 1.595 | 1.605 
|e | 229 | 279 | | 0.090 | 0.110 
fee 
ea) 
|S 


A 

Ay 

Ao 

Ag 

Dy 

. &4 

L 

S; 
ISSUE 


Figure 5.3. 168 Lead Ceramic PGA Package Dimensions 


6.0 PACKAGE THERMAL The i860 XR microprocessor is specified for opera- 
SPECIFICATIONS tion when Tc is within the range of 0°C-85°C. To 
may be measured in any environment to determine. 
For this section, let: whether the i860 XR microprocessor is within speci- 
P = rnaximum power consumption fied operating range. The case temperature should 
be measured at the center of the top surface oppo- 
Tc = case temperature — site the pins. : 
Ta = ambient air temperature 


Ta can be calculated from Oca (thermal resistance 


ca = thermal resistance from case to ambient air from case to ambient) with the following equation: 


0jc = thermal resistance from junction to case | 


thermal resistance from junction to ambient Ta = To — P*Oca 
air 


2 

qa 

> 
I 
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Typical values for @ca and @jc¢ at various airflows 
are given in Table 6.1 for the 1.75 sq. in., 168 pin, 
ceramic PGA. @ jc is also shown so that 0a can be 
calculated by: 


Oca = 93a — O5c 


Note that @jc with a heatsink differs from 0 jc with- 
out a heatsink because case temperature is mea- 
sured differently. Case temperature for @jc with 
heatsink is measured at the center of the heat fin 
base. Case temperature for 8jc without heatsink is 
measured at the center of package top surface. 


i860™ XR MICROPROCESSOR 


PRELIMINARY 


Table 6.2 shows the maximum T, allowable (without 
exceeding Tc) at various airflows and operating fre- 
quencies (fc.x). 


Note that T, is greatly improved by attaching ‘‘fins’’ 
or a “heat sink” to the package. P (the maximum 
power consumption) is calculated by using the maxi- 
mum Icc at 5V as tabulated in the DC Characteris- 
tics of section 7. 


Figure 6.1 gives typical Icc derating with case tem- 
perature. For more information on heat sinks, mea- 
surement techniques, or package characteristics, re- 
fer to Intel Packaging Handbook, order number 
240800. 


Typical part at 5V with maximum load 


Icc (mA) 
580 


570 
560 


LEE 
HL 


Rea 
ae 
——. 
es 
os 
a 
ae! 
— 
peal 
bee 


TCC 
VTA 


25. 0 MHz 


240296-33 


Figure 6.1. lcc vs Case Temperature 


Table 6.1. Thermal Resistance (°C/W) 0jc and Oca 


*Nine-fin, unidirectional heat sink (fin dimensions: 0.350” height, 0.040 
width, 0.115” center-to-center spacing, 1.530” length). 
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Table 6.2. Maximum Allowable T, at Various Airflows 
In°c 


Ta with - 
Heat Sink* 


‘Heat Sink 


Airflow-ft/min (m/sec) 


hie 200 | 400 | 600 | 800 | 1000 
|) (0) | (1.01) | (2.03) | (3.04) | (4.06) | (5.07) 


779 
ra28 [493 


75.5 78.5 


S| 613] 60 | 65 | 


*Nine-fin unidirectional heat sink (fin dimensions: 0.350” - height, 0.040 width, 
0.115” center-to-center spacing, 1.530” length). 


- 7.0 ELECTRICAL DATA 


Inputs and outputs are TTL compatible, except for 
CLK. All input and output timings are specified rela- 


tive to the 1.5 volt level of the rising edge of CLK 


and refer to the point that the signals reach 1.5V. 


7.1 Absolute Maximum Ratings 


Case Temperature To under Bias ......0°C to 85°C 


Storage Temperature .......... — 65°C to + 150°C 


Voltage on Any Pin 


with Respect to Ground............ ..- 0.510 6.5V — 


7.2 D.C. Characteristics 


NOTICE: This data sheet contains preliminary infor- 
mation on new products in production. The specifica- 
tions are subject to change without notice. Verify with 
your local Intel Sales office that you have the latest 
data sheet before finalizing a design. 


~*WARNING: Stressing the device beyond the “Absolute 


Maximum Ratings” may cause permanent damage. 
These are stress ratings only. Operation beyond the 
“Operating Conditions” is not recommended and ex- 
tended exposure beyond the “Operating Conditions”’ 
may affect device reliability. 


Table 7.1. DC Characteristics 


To = 0°C to 85°C, Voc = 5V 45% 


Input LOW Voltage 
Input HIGH Voltage 
CLK Input LOW Voltage 
CLK Input HIGH Voltage 
Output LOW Voltage 
Output HIGH Voltage 
Power Supply Current 
CLK = 25.0 MHz 
CLK = 33.3 MHz 
CLK = 40.0 MHz 
Input Leakage Current 


Output Leakage Current. 

_ Input Capacitance 
1/O or Output Capacitance 
Clock Capacitance 


NOTES: a 


(Note 1) 
(Note 2) 


<<<<<< 


Voc @5V 
Voc @5V 
Voc @5V © 
No pullup 
or pulldown 


(Note 3) 
(Note 3) . 
(Note 3) 


1. This parameter is measured at 4.0 mA for A31-A3, D63-D0, BE7 #-BEO#; at 5.0 mA for all other outputs. 
2. This parameter is measured at 1.0 mA for A31-A3, D63-—D0, BE7 # -BEO#; at 0.9 mA all other outputs. 
3. These are not tested. They are guaranteed by design characterization. . 
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7.3 A.C. Characteristics 


Table 7.2. A.C. Characteristics 
To = 0°C to 85°C, Voc = 5V £5% 
All timings measured at CLK = 1.5V = ee otherwise Pao. 


40 MHz 
Parameter err Max | Min Notes 
ns (ns) oe ean 


io aves es 
otk towTime i 
PoukFaitime 
Pouk Rise Tine 


= 
—— 
—— 
a 
— 

A31 -A3, PTB, W/R#, NENE# 3.5 
eae 
— 
re 


a Poa | 


== [=| => 
Pas | 26 | 3s | at | opr toad 
a5 | ao | 35 | 25 | Wotot 


ol ad al Bd ee 


cae ET 


| (Note 2) | 2) 


io) 
on 


ny 
G & |} PM NO - N 
© on i. Oi nin on | 2 2 


BEn#* Valid Delay haae. | 
Float Time, All | 3.5 | 


ADS #, BREQ, LOCK#, HLDA 3.5 
Valid Delay 
| 19 | D63-D0 Valid Delay 
Setup Time, All Inputs 


t1 1a Hold Time, All Inputs except 4 (Note 2) 
DATA 
t1ib | DATA Hold Time 


—-tt-F--- 
NOTES: 


1. Float condition occurs when maximum output current becomes less than I_o in magnitude. Float delay is not tested. 

2. INT and HOLD are asynchronous inputs. The setup and hold spegneaels are given for test purposes or to assure 
ene on a specific rising edge of CLK. 

*n = 0,1,..., 7 


oo 
on 


oO 


—_. 
al, 
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t6 


max?" max Snax 


: t6riinst8minet min 
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. . AT max —— 
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Figure 7.1. CLK, Input, and Output Timings 
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nom #15 


nom +10 iA 
A31=A 


nom BE7#—BEO¢ 
_ ADS#, BREQ, LOCK#, HLDA 
5 15 


25 50 75 100 12 0 
LOAD CAPACITANCE, C, (pf) 


TYPICAL* OUTPUT 
DELAY (ns) nom +s 
@ 1.5V 


NOTES: | 

Graphs are not linear outside the C, range shown. 
nom = nominal value given in the AC timing table. 
*Typical part under worst-case conditions. 


240296-26 


Figure 7.2. Typical Output Delay vs Load Capacitance under Worst-Case Conditions 


TYPICAL* OUTPUT 
SLEW TIME (ns) 9 
(0.8 = 2.0V) 


DS#, BREQ, LOCK#, HLDA 


¥/R#, NENE# 


25 50 75 100 125 150 


| LOAD CAPACITANCE, C, (pf) 
NOTES: . 7 240296-27 
Graphs are not linear outside the C_ range shown. - 
*Typical part under worst-case conditions. 


Figure 7.3. Typical Slew Time vs Load Capacitance under Worst-Case Conditions 


0 
8 12 16 20 24 26.30 34 3840 - 
| FREQUENCY (MHz 
NOTES: : .. > : ue) 240296-28 
Graphs are not linear outside the frequency range shown. 
*Worst-case supply current at 5V. 


Figure 7.4. Typical Icc vs Frequency 
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8.0 INSTRUCTION SET 


Key to abbreviations: 


For register operands, the abbreviations that describe the operands are composed of two parts. The first part 
describes the type of register: 


Cc One of the control registers fir, psr, epsr, dirbase, db, or fsr 
f One of the floating-point registers: f0 through f31 
i | One of the integer registers: rO through r31__ 


- The second part identifies the field of the machine instruction into which the operand is to be placed: 


src? The first of the two source-register designators, which may be either a register or a 16-bit 
immediate constant or address offset. The immediate value is zero-extended for logical 
operations and is sign-extended for add and subtract operations (including addu and subu) 

and for all addressing calculations. 


~ srotni Same as src7 except that no immediate constant or address offset value is permitted. 
src1s Same as src7 except that the immediate constant is a 5-bit value that is zero-extended to 32 
bits. 
src2 The second of the two source-register designators. 
dest The destination register designator. 


Thus, the operand specifier /src2, for example, means that an integer register is used and that the encoding of 
that register must be paces in | the src2 field of the machine instruction. 


Other (nonregister) operands are 2 specified bya one-part abbreviation that represents both the type of operand 
required and the instruction field into which the value of the operand is placed: 


#const A 16-bit immediate constant or address offset that the i860 XR microprocessor sign-extends 
to 32 bits when computing the effective address. 
 /broff A signed, 26-bit, immediate, relative branch offset. 
sbroff A signed, 16-bit, immediate, relative branch offset. 
brx A function that computes the target address by shifting the offset (either /broff or sbroff) left 


., by two bits, sign-extending it to 32 bits, and adding the result to the current instruction pointer 
plus four. The resulting target address may lie anywhere within the address space. 


Unless otherwise specified, floating-point operations accept single- or double- -precision 

source operands. and produce a result of equal or greater precision. Both input operands 

must have the same precision. The source and result precision are specified by a two-letter 
_ suffix to the mnemonic of the operation. 


Other abbreviations include: 

Precision specification .ss, .sd, or .dd (.ds not permitted). Refer to Table 8.1. 
Precision specification .ss, .sd, .ds, or .dd. Refer to Table 8.1. 

.sd or .dd. Refer to Table 8.1. 

.SS or .dd. Refer to Table 8.1. 

.b (8 bits), .s (16 bits), or .I (32 bits) 

I (32 bits), .d (64 bits), or .q (128 bits) 

1 (32 bits), or .d (64 bits) 


- Table 8.1. Precision Specification 


| Source | Result 
Precision | Precision 


single single 
single double 
double double 
double ~ single 


N<kE <4 
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mem.x(adaress) The contents of the memory location indicated by address with a size of x. . 


PM The pixel mask, which is considered as an array of eight bits PM[7]..PM[0], where PM(o] is 
the least significant bit. 


8.1 Instruction Definitions in Alphabetical Order 


adds ISTO), SICA, NOCSh icine Sh waied votes Cab ues h Oe Ae EO a CON OSes Add Signed 
idest <— isrct + isrc2 
OF <— (bit 31 carry # bit 30 carry) 
CC set if isrc2 < —isrc7 (signed) 
CC clear if isrc2 = —isrc? (signed) 


addu ISICT ASICZIGESE 5 es eo wign kote ahd eae WEE RES ews hota le ee ebieeee in ee Add Unsigned 
idest <— isrc? + isrc2 | 
OF <— bit 31 carry 
CC < bit 31 carry 


and isrc, isrc2, idest ..........0005. ene utestierdete. Ne ren en ne eee eee Logical AND 
Idest <— isrc? and isrc2 
CC set if result is zero, cleared otherwise 


andh De COMET, ISTOL, MOOSE 55 nin ates oa ca A ind wal he lin gis Aan acai SOE ee ee Logical AND High Se 


idest <— (#const shifted left 16 bits) and /src2 
CC set if result is zero, cleared otherwise 

andnot ISIC LASICA,MOCST on wevetansg Wak evel cues Uioh owe dasha se aoe a ous fauh tha Gone oe _ Logical AND NOT 
idest. <— notisrc7 and isrc2 | 
CC set if result is zero, cleared otherwise | 

andnoth #const, isrc2, idest ......... 0.60000. Be eisenier eons caetine Baraat Logical AND NOT High 
idest <— not (#const shifted left 16 bits) and /src2 | : 
CC set if result is zero, cleared otherwise 


be IDI ONG stots rte neta sateen eileen hbo Oat eh osc ....Branch on CC 
IF CC = ; 
THEN continue execution at brx(/broff) 
FI | —_ | | 

be.t IDIOM taps es Sears Reta anderen eraiats woah aebesaeet Beet ante Branch on CC, Taken 
IF CC = ; 


THEN execute one more sequential instruction 
continue execution at brx(/broff) 
ELSE skip next sequential instruction 


bla - * isro1ni. ISTCZ SOIOUL 6 3 sg ig BEARS Se aS ES Ee ea Pate ease: Branch on LCC and Add 
LCG-temp clear if isrc2 < —isrctni (signed) 
LCC-temp set if ¥src2 = —Jsrc1ni (signed) 
Isrc2 <— isrctni + isrc2 | 
Execute one more sequential instruction 
IF LCC 
THEN LCC <— LCC-temp 
~~» gontinue execution at brx(sbroff) 
ELSE LCC <~— LCC-temp 


Fl | | | . 
bne 1/510); AES Re ee RTO Re ee ete en aE aD Ree ORT EA Oe REAL LO aE: Branch on Not CC 
IF CC = | 
THEN continue execution at brx(/broff) 
Fl | : 
bnc.t WOVOTE: Bisa eleink aais a8 eh ew ek ee ie PRES Rabe ace te Sapa: -Branch on Not CC, Taken 
IF -CC =0 


THEN execute one more sequential instruction 
continue execution at brx(/broff) 
ELSE skip next sequential instruction 
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br IDIOT iE cwscics nie en tea ute Sal ue Merde oe Ls sueeeseeeeees+.-Branch Direct Unconditionally 
_ Execute one more sequential instruction. | 
Continue execution at brx(/broff). 


bri USICTON Suositecttne ices uee ate sk aba eurs Chace e iieintted ato aera dye Branch Indirect Unconditionally 
Execute one more sequential instruction 7 
IF any trap bitin psr is set | 


THEN ~ copy PU to U, PIM to IM in psr 
clear trap bits | 
IF DS is set and DIM is reset 
THEN enter dual-instruction mode after executing one 
instruction in single-instruction mode 
ELSE IF DS is set and DIM is set 
THEN enter single-instruction mode after executing one : 
instruction in dual-instruction mode os - 
ELSE IF DIM is set 
THEN enter dual-instruction mode 
for next two instructions 
ELSE enter single-instruction mode 
- for next two instructions 
Fl | 
Fl 
Fl 
Fl . 
- Continue execution at address in isrc7ni 
(The original contents of isrc7n/ is used even if the next instruction 
modifies /src7ni. Does not trap if /src7n/ is misaligned.) 


bte ‘isrcts, isrc2, sbroff........ A emteieane otis ice eaeeeee ee ...Branch If Equal 
IF isrc1s = isrc2 | | 
THEN continue execution at brx(sbroff) 
Fl | | 

btne _—_isrcts, isrc2, sbroff.......... peheoaes hifdch dialnclshutte sa asieatd dea apie asda: Branch If Not Equal 
IF  _ isrce1s # Isrc2 | | - 
THEN continue execution at brx(sbroff) 
Fi. | | ) : 

call IDIOT sds uta. 0 Sanna cacadansantad Coke STS 1a Rie aes eee eteaeisede ..... Subroutine Call 


r1 <— address of next sequential instruction + 4 (+8 in dual mode). 
Execute one more sequential instruction 
Continue execution at brx(/broff) 


Calis. USOT cio55 ccttin eaieteeaceatas andes Sau senwee eens err Indirect Subroutine Call 
r1 <— address of next sequential instruction + 4 ak in dual mode) | 
Execute one more sequential instruction 
Continue execution at address in /src7ni 

(The original contents of /src7ni is used even if the next instruction . 
modifies /src7ni. Does not trap if /src7ni is misaligned. 
The register /src7ni must not be r1.) 


fadd.p fsrc1, fsrc2, fdest......... reece eee eee e es eee eT eee ; se ae ; Floating-Point Add 


fdest <— fsrc1 + fsrc2 | | 
. faddp ISICLTSICZ T0CSI oh cdinnen ih da aee whe bec t eee aha nouns td exons Add with Pixel Merge 


fdest <— fsrc1 + fsrc2 . 
Shift and load MERGE register as defined in Table 8.2 
faddz ferct, fSrC2, fACSt 0. oe cece c cece ene cccnneceeuuecterbneeeeeas oo... Add with Z Merge 
fdest <— fsre1 + fsrc2 
Shift MERGE right 16 and load fields 31..16 and 63..48 | : | | 
famov.r fsrcl, fdest............ See ee ee Aenea tuey evo wear es ioe Floating-Point Adder Move 
fdest <— fsrc1 - ae 
Send fsrc7 through the floating-point adder. (Preserves —0O (minus ev) when fsrc1 is — 0, Bee 
must be coded as f0 by the assembler.) 
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NaGGW © FSICLISICZ TGCS soos ga iee eee es ee pag bee ie ne eS Saw Aa ER Long-Integer Add 
fdest <— fsrc?t + fsrc2 = , - | 

fisub.w = fsrc7, fsrc2, fdest............005. er eee tia See rere ...Long-Integer Subtract 
fdest <— fsrc1 — fsrc2 ~ | 

fix.v ISIC 1 IO CSE is cehea SOG OREO es ia EER RROD Cae eae ae Floating- -Point to Integer Conversion 


fdest <— 64- bit value with low-order 32 bits equal to integer part of fsrc7 rounded 


- 4 i, Floating-Point Load 
fld.y ISTCT (ISICZ) TOCSE iris tb 268 65 eee wi ba ReneS Pes Acute neh Ree OEE ae ee EDO ....(Normal) 
fld.y ISICTUISICL) OF ARAOCSE o.uic sue een Gat ate eh dele BG we a A AE Oe BES (Autoincrement) 

fdest <— mem.y (isrc? + isrcZ) yd 

IF autoincrement 

THEN isrc2 <— isrc? + isrc2 


Fl | | 
| Cache Flush 
flush PCONSIUISICL) ch hsincues ee aula’ bated Des Mateoak dae ba Fhe eave eee eeeeeeveeceess (NOrMal) 
flush # CONSUISIOZ) te Agia 2 ick ee eee A ewe ha RON KE e a ae ONS are bees (Autoincrement) § 


Replace block in data cache with address cone! 4 ‘1SIC2). 
Contents of block undefined. 

IF autoincrement 

THEN isrc2 <— #const + isrc2 

Fi 


— fmlow.dd = fsrc7, fsrc2, fd@St. ok ce ene tee renee neti aeeeees Floating-Point Multiply Low 
fdest <— low-order 53 bits of fsrc7 mantissa x fsrc2 mantissa | : : 
fdest bit 53 <— most significant bit of mantissa 


fmov.r ISICT, JOOS ticcten hihi hel Geek VA? Ue awe G oh nre sata bP aaad Floating-Point Reg-Reg Move 
Assembler pseudo-operation __ : 
fmov.ss fsrc1, fdest = fiadd.ss fsrc7, f0, fdest 
fmov.dd /fsrc1, fdest = fiadd.dd fsrc7, f0, fdest 
fmov.sd fsrc?, fdest = famov.sd fsrc7, fdest » 
fmov.ds fsrc?, fdest = famov.ds fsrc7, fdest 


fmul.p ——sfsrc7, fsrc2, fdeSt .... 00. cece cece eee eee or . . ..Floating-Point Multiply 


fdest <— fsrc?t X fsrce2 
NOt ets nce iaena eos ose ees re ie omeeio ......./...Floating-Point No Operation 


Assembler pseudo-operation 
fnop = shrd r0, r0, rO 


form fsrcl, fdest ........ 00... eee a nt ee Beery OR with MERGE Register 


fdest <— fsrc? OR MERGE 

MERGE < 0 - 3 
frep.p ISIC2 FOOSE shstssnr rand Dadawavone vio sisk eb atuadnees errs ..... Floating-Point Reciprocal 

fdest <— 1/fsrc2 with maximum mantissa error < 2-7 | 


frsqr.p ISIC2, 1AOST 6.6 eee tabau45 butted awe Bote ence eee Floating-Point Reciprocal Square Root 
fdest <— 1/SQRT (fsrce2) with maximum mantissa error < 2-7 gt vol ae | : 


| Floating-Point Store 
fst.y FOCSE ISIC TUSICL) os acta iie ns Ga eich on oes NANG wa EWR AR oho RAROS ea Ee he (Normal) 
fst.y TAOS, ISICTUSIC2Z) FO snuck eb Se PONG DEANS Aa kee SOME RES CREE TESS (Autoincrement) 
mem.y (/src2 + isrc1) <— fdest ES Bh - 
IF autoincrement | : 
THEN /sre2 <— isre? + isrc2 


Fl , op 2% 
fsub.p TSI COT ISI CZ TOCSM tire soon G08 Males Bret e Sea ae heat Seen Dees aoe Floating-Point Subtract 
' fdest <— fsrc1 — fsrc2 | : 
TUNG “78/07, JOCSE S iuctuen dat betas asians ees, ates Floating-Point to Integer Conversion 
fdest <— 64-bit value with low-order 32 bits equal to integer part of fsrc7 : 
txfr ferct, IOS i iteac cna Sirs titan Claas oan nee aavecaue sn eokades Transfer F-P to integer Banieiek 


idest <— fsrc? 
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TZCIKE 9 ISICT ATSIC, TOCSE apa bate eT a be ROR DEERE TEETER Lawes 32-Bit Z-Buffer Check 
Consider fsrc?, fsrc2, and fdest as arrays of two 32-bit. , | 
fields fsrc7(0)..fsre7(1), fsrc2(0)..fsrc2(1), and fdest(0)..fdest(1) 
where zero denotes the least-significant field. 
_. PM < PM shifted right by 2 bits 
~ FOR i = Oto 1 - 
DO 
PM [i+ 6] <— fsrc2\\) < ferett) iuceionaa) 
fdest\i) <— _ smaller of ee): and fsrc1\i) 


OD . | 
MERGE <— 0 : 
fzchks ISTCN ISICZ, TOOST 5.s.050: 5: 302 e As Boe tu daneeh Ea edhe eee Ses Siehe m . 16-Bit Z-Buffer Check 


Consider fsrc7, fsrc2, and fdest as arrays of ‘chur 16-bit 
fields fsrc7(0)../src1(8), fsrc2(0)..fsrc2(3), and fdest(0)..fdest(3) 
‘ where zero denotes the least-significant field. 
' .PM <— PM shifted right by 4 bits 
-  FORi = 0to3 
DO 
PM [i + 4] < fsrc2i) < fsrc7(i) (unsigned) 
fdesti) <— smaller of fsrc2(i) and fsrc7(i) 


OD 
MERGE <— 0 

BOVE as ieee eh cee ae oe Beene erie aaa ee Vies ces poe Trap on Integer Overflow 
lf OF in epsr = 4 generate ‘rap with IT set in psr. . 

ixfr isrctni, fdest MEAT ee ree ee tere re 
fdest <— isretni mr | : 7 , 

Id.c CSICZ HOGS cs Sob ea nae aesaananse med cliteienaeaanerse Load from Control Register 
idest <— csrc2 | Ae ee ee | | . 

Id.x isrcN(isrc2), IdeSt ook icq tiie taat ee rere er re: ..... Load integer’ 

| idest <— mem.x (srct + isrc2) — | : 
IOC etek Sow eth asa swe hee mens bao eee nea oey aes Begin | interlocked Sanuenes | 


Set BL i in dirbase. ‘The next load or store that misses the cache locks that location. 
Disable interrupts until the bus is unlocked. 


- mov ISIC2, IdOSt 6... eee eee cece eee seseeanes detesssseeessssaeeee Register-Register Move 
Assembler pseudo-operation ! : | 
mov isrc2, idest = shi r0, isrc2, idest 


mov CONSIO™, IOOST 26s 6s 5 ob ew Sek eS Nien ea ime blo tila aut c pautna snes ate Constant-to-Register Move 
Assembler pseudo-operation | a 
adds /%const32, r0, idest. 
.. when const32 < 0x8000 


— orh h%const32, r0, idest 
or l%const32, idest idest  . 
. when const32 = 0x8000 


rrr ieee ald bedeta decane Core-Unit No Operation 
Assembler pseudo-operation | | 
nop = shir0,r0,r0 . 


or ISICT ISICZ, JOGSE oo ee alent Weew ae aace wien i e.easte Giese AE dtl Serra Main eee _-Logical OR 
idest <— isrc? OR isrc2 
CC set if result is zero, cleared otherwise | % & 
orh | # CONSE ISICZ ICOSE fos 55-25 ein G Bie he Bes HR bo es Oooo bee wea... Logical OR High 
_idest <—. (#const shifted left 16 bits) OR /src2 | 
CC set if result is zero, cleared otherwise 
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pfadd.p ISICT, SSICZ, SGESE cole neu EER S READ ACRE IE EE OR LEADER eh BORE Pipelined Floating-Point Add 
fdest <— last stage Adder result 
Advance A pipeline one stage 
A pipeline first stage <— fsrc7 + fero2 


pfaddp ISICT, TSICZ, MOOSE ot cet aiid are eG EK EN EERE OOS Pipelined Add with Pixel Merge 
fdest <— last stage Graphics result 
last stage Graphics result <— fsrc7 + fsrc2 
Shift and load MERGE register from last stage Graphics result as defined in Table 8.2 


pfaddz ISIC), 1SICZ, 1OCSEs siren iak tangata tink aw eeewas ecaiay haaneaeone Pipelined Add with Z Merge 
fdest <— last stage Graphics result | 
last stage Graphics result <— fsrc/ + fsrc2 | 
Shift MERGE right 16 and load fields 31..16 and 63..48 from last stage Graphics result 


pfam.p ISIC), ASICZ;ACOST eeiwe Wie ih HERES ERE GRA OOes Pipelined moatng: -Point Add and Multiply 
fdest <— last stage Adder result 
Advance A and M pipeline one stage (operands Beeeeeed before advancing pipeline) 
A pipeline first stage <— A-op1 + A-op2 
M pipeline first stage <—- M-op1 Xx M- “OPE 


DIAMOV.E. VSIC), 1OCST acts y ks cules ted wae eG Thaw ee eke eS Pipelined Floating- -Point Adder Move 74 
fdest <— last stage Adder result ‘ 
Advance A pipeline one stage 
A pipeline first stage <—- /fsrc7 


pfeq.p ISICT ASICZ, TOCSE. os cet Riese te huetas Aes be es Pipelined Floating-Point Equal Compare 
fdest <— last stage Adder result 3 | 
CC set if fsrc7 = fsrc2, else cleared 
Advance A pipeline one stage | 
A pipeline first stage is undefined, but no result exception occurs 


pfgt.p fsrc1, fsrc2, fdest 60... 6. cee ees bpaese Pipelined Floating-Point Greather-Than Compare 
(Assembler clears R-bit of instruction) | 
fdest <— last stage Adder result 
CC set if fsrc7 > fsrc2, else cleared — 
Advance A pipeline one stage 
A pipeline first stage is undefined, but no result exception occurs 


Piladd:w -%siC1, (SICZ, (00ST sone sh ue Ss See hoe AEE SA .....Pipelined Long-integer Add 
fdest <— last stage Graphics result : . 
last stage Graphics result <— fsrc? + fsrc2 


pfisub.w = fsrc7, fsrc2, fdeSt 6... ccc cece eens ,..Pipelined Long-Integer Subtract 
fdest <— last stage Graphics result 
last stage Graphics result <— fsrc7 — isrc2 


pfix.v ISIOT, TOOST 03.0280 eee PER eT Oe ere Pipelined Floating-Point to Integer Conversion 
fdest <— last stage Adder result . 
Advance A pipeline one stage 
A pipeline first stage <— 64-bit value with low-order 32 bits 
equal to integer part of fsrc7 rounded | . | 
| : : Pipelined Floating-Point Load 
pfid.z ISICTUISICZ) TACSE ceiua por Rh tio reudedaweanss tiie ieee aa eadwtn aoe anes ,....(Normal) 
pfid.z ISTODUSICL) Ti PIO OSE eee eaten ea Fe Re PASS OU UA REL ea LRA os Re -(Autoincrement) 
fdest <— mem.z (third previous pfid’s (/src7 + isrc2)) 
(where .z is precision of third Previous pfid.z) 
If autoincrement 
THEN isrce2 <— isrc? + isrc2 
Fl 


pfle.p ISICT, ISICZ, IOCST <4 26 Sein beter geeeenwoe ene andes ..Pipelined F-P Less-Than or Equal Compare 
Assembler pseudo-operation, identical to pfgt.p except that 
assembler sets R-bit of instruction. 
fdest <— last stage Adder result 
CC clear if fsrc7 < fsrc2, else set 
Advance A pipeline one stage 
A pipeline first stage is undefined, but no result exception occurs 
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pfmam. O° 1SICT ISIC2 TOCSE aot tht eee we eee eh ween Pipelined Floating-Point Add and Multiply 
fdest <— last stage Multiplier result 
Advance A and M pipeline one stage (operands accessed before advancing pipeline) 
A pipeline first stage <~- A-op1 — A-op2 
M pipeline first stage <— M-op1 xX M-op2 


DIMOVEE “ISICT, IGOSE . 5 iin cor tes SRA waa One Cae ewd Pipelined Floating-Point Reg-Reg Move 
Assembler pseudo-operation | , = 
pfmov.ss /fsrc7, fdest = pfiadd.ss fsrc7, f0, fdest 
pfmov.dd /src7, fdest = pfiadd.dd fsrc7, f0, fdest 
pfmov.sd fsrc7, fdest = pfamov.sd fsrc7, fdest 
pfmov.ds /src/, fdest = pfamov.ds fsrc7, fdest 


pfmsm.p  fsrc?/, fsrc2, fdest ............. Roki aieaaeeds Pipelined Floating-Point Subtract and Multiply 
fdest <— last stage Multiplier result | | : 
Advance A and M pipeline one stage (operands accessed before advancing pipeline) 
A pipeline first stage <—~ A-op1 — A-op2 
M pipeline first stage <— M-op1 < M-op2 


pfmul.p ISICT, ASICZ, JOOSE ss 5:8 a wisi nda en base ie tw: Pace enews Pipelined Floating-Point Multiply 
fdest <— last stage Multiplier result | . oS : 
Advance M pipeline one stage 
M pipeline first stage <— fsrc7 x fsrc2 


pfmul3.dd /fsrc7, fsrc2, fdest........ BL ernee a aye eee errntnt tee gees Three-Stage Pipelined Multiply 
‘dest <— last stage Multiplier result 3 : | ae 
Advance 3-Stage M pipeline one stage 
M pipeline first stage <— fsrc7 X fsrc2 


pform ISIC AGCSU cot puted Coe bile sates Semen ore ee Ese eers Pipelined OR to MERGE Register 
fdest <— last stage Graphics result | 
last stage Graphics result <— fsrc7 OR MERGE 
MERGE <0 .- 


pfsm.p ISICT, ISICZ TOGST i528 ines Bes Neen aa. mene Pipelined Floating- -Point Subtract and Multiply 
fdest <— last stage Adder result 
Advance A and M pipeline one stage (operands accessed before advancing pipe! 
A pipeline first stage <— A-op1 — A-op2 
-M pipeline first stage <— M-op1 xX M-op2 


pfsub.p = fsrc7, fsrc2, dest ... 6... eee Aah ueres eo Pipelined Floating-Point Subtract 
_ fdest.<— last stage Adder result | 
Advance A pipeline one stage 
A pipeline first stage <— fsrel + fsre2 


pftrunc.v fsrc/, cL ee nes eee ere Pipelined Floating-Point to Integer Conversion 
fdest <—- last stage Adder result 
Advance A pipeline one stage 
A pipeline first stage <— 64-bit value with low-order 32 a 
equal to integer part of /src7 


pizchk lx .. ISICT, ISICZ 10OSE sos e568 664 iment ded MiN ee eee RS eee Pipelined 32-Bit Z-Buffer Check 
| Consider fsrc1, fsrc2, and fdest, as arrays of two 32-bit | 
fields fsrc71(0)..fsrc71(1), fsrc2(0)..fsrc2(1), and facetlt) og 
where zero denotes the least significant field. 
PM < PM shifted right by 2 bits | 
FOR i = 0 to.1 
DO 


oy 


PM [i + 6] < fsrc2(i) < fsrc7(i) (unsigned) 

fdest(i) <— last.stage Graphics result 

last stage Graphics result <— smaller of fsrc2(i) and fsrc1\i 
OD 
MERGE <— 0 
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DIZCHKS: SSICT,-ISIGZ, JOCSE 58h cis si dnta- wh ed es BODES FARE ISAA RO DRE Pipelined 16-Bit Z-Buffer Check 
Consider fsrc7, fsrc2, and fdest, as arrays of four 16-bit | 
fields fsrc71(0)..fsrc1(3), fsrc2(0)..fsrc2(3), and fdest(0)..fdest(3) 
where zero denotes the least significant field. 
PM < PM shifted right by 4 bits 
FORi = 0to3 
DO 
PM [i + 4] <— fsrc2(i) < fsre7(i) (unsigned) 
fdest(i) <— last stage Graphics result 
last stage Graphics result <— smaller of fsrc2(i) and fsrc7(i) 


MERGE <— 0 ; | a 
pst.d 1OCSI, FF CONSIUISICZ) = 065 2 ER Gae OUR FPG OTA GRA SEE IAA OR EO eS Pixel Store 
pst.d fdest, #CONSEISICA) + A coc ccc ccc cece nee nnneene Rug Spee tetas Pixel Store Autoincrement 


Pixels enabled by PM in mem.d (/src2 + #const) <— fdest 
Shift PM right by 8/pixel size (in bytes) bits 

IF autoincrement 

THEN isrc2 <— #const + isrc2 


FI : 
shl ISFED IETOO AOS a0 bis vaio eee Gn ola Reka Wiebe ade kiana Shift Left — 
idest <— isrc2 shifted left by /src7 bits - | 
shr_ isrcl, isrc2, idest ....... igindenehoaakans Pieler a iiaess Lousiubeuisauies Shift Right 


SC (in psr) <— /src7 | 
idest <— isrc2 shifted right by /src7 bits 


shra iSrC1, ISTC2, idOSt.. 0. e eee c eee e eee ee ees eater oe Shift Right Arithmetic 
idest <— isrc2 arithmetically shifted right by /src7 bits | - | 

shrd IS1C1, ISITC2, IACStE oo. cc we es Sa Re Diere bar oii S wn oR eee ee Shift Right Double 
idest <— low-order 32 bits of /src7.isrc2 shifted right by SC bits — | 

st.c [SIOTN CSIC? bce h eee es oben de ke oubehe te eek aba es Serer Store to Control Register 
csrc2 <— isrctni | | | . _ 

st.x isrcin, #CONSUISTC2) 6... cc eee Ge Sacer ta cSt aad ice Td gusantteot ea caer wee ghee ee cain Store Integer 
mem.x (isrc2 + #const) <— isrcini | | 

subs ISIC 1, ISICZ, AO CSE us stye gaat Sah tes Ooh ath aie ue a lone Bb ara tae eg em etnt Subtract Signed 


idest <— isrc? — Isrc2 
OF <~ (bit 31 carry ¥ bit 30 carry) 
CC set if isrc2 > isrc? (signed) 
CC clear if isrc2 < isrc? (signed) : 
subu ISIC) ASICZ, IOCSE hn ti tnd esa we ees. os Gann hewn uew aco ate Bana .... subtract Unsigned 
idest <— isre? — isrc2 : 
OF <— NOT (bit 31 carry) 
CC < bit 31 carry 
(i.e. CC set if src2 < isrc7 (unsigned) 
CC clear if isrc2 > isrc? (unsigned) 


trap ISICTALISICZ HOGST Ais cee ae nee ae, 6d ek eo UR Oe ea aS Gad RAR ES Software Trap 
Generate trap with IT set in psr ; | 
UNIGCK, 245). ce cde dotn oie tad santa oe eine ou aed atau tu taduna eg ne ware End Interlocked Sequence 


_-Clear BL in dirbase. The next load or store unlocks the bus. 
Enable interrupts after bus is unlocked. 


xor ISICD ISICZ, AOOST 2 as ONG kha eres Means CR PCE a Orated ane ee Ges Logical Exclusive OR 
idest <— isrc1 XOR isrc2 | 
CC set if result is zero, cleared otherwise 


xorh #CONSt, ISPCZ, IOSE 6... cece ce cece even even cucnencesueesaeay Logical Exclusive OR High 
idest <— (#const shifted left 16 bit) XOR /src2 7 
CC set if result is zero, cleared otherwise 
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Table 8.2. FADDP MERGE Update | 


Fields Loaded From 
Result into MERGE 


63..56, 47..40, 31..24, 15..8 
63..58, 47..42, 31..26, 15..10 
63..56, 31..24 


8.2 Instruction Format and Encoding 


All instructions are 32 bits long and begin on a four- 
byte boundary. When operands are registers, the 
register encodings shown in Table 8.3 are used. 
There are two general core-instruction formats, 
REG-format and CTRL-format, as well as a separate 
~ format for floating-point instructions. 


8. 2. 1 REG- FORMAT INSTRUCTIONS 


Within the REG-format are several variations as 


shown in Figure 8.1. Table 8.4 gives the encodings 
for these instructions. One encoding is an escape 
code that defines yet another variation: the core es- 


cape instructions. Figure 8.2 shows the format of 


this group, and Table 8.5 shows the encodings: 


In these instructions, the src2 field selects one of | 


the 32 integer registers (most instructions) or five 
control registers (st.c and Id.c). Dest selects one of 
the 32 integer registers (most instructions) or float- 
ing-point registers (fld, fst, pfld, pst, ixfr). For in- 
structions where src7 is optionally an immediate val- 
ue, bit 26 of the opcode (I-bit) indicates whether src7 
is an immediate. If bit 26 is clear, an integer register 
is used; if bit 26 is set, src7 is contained in the low- 
order 16 bits, except for bte and btne instructions. 
For bte and btne, the five-bit immediate value is 
contained in the src7 field. For st, bte, btne, and 
bla, the upper five bits of the offset or broffset are 
contained in the dest field instead of src7, and the 
lower 11 bits of offset are the lower 11 bits of the 
instruction. 
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Amount 
(Field Size) 


- ” Bit 28 [seo] Oparand Size 
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Table 8.3. Register Encoding 


[Register| _Encoding —_ 
a 4 ro ed 0 | 


Fault Instruction 
Processor Status 
Directory Base 

Data Breakpoint 
Floating-Point Status 
Extended Process Status 


For Id and st, bits 28 and zero determine operand 
size as follows: | 


8-bits 
8- bits 
16-bits 
32-bits 


'When src7 is an immediate and bit 28 is set, bit zero 


of the immediate value is forced to zero. 


For fid, fst, pfld, pst, and flush, bit 0 selects autoin- 
crement addressing if set. For fld, fst, pfld, and 
pst, bits one and two select the er eiene size as 
follows: 


. 64-bits 


128-bits 
32-bits 
32-bits . 


When src7 is an immediate value, bits zero and one 
of the immediate value are forced to zero to main- 
tain alignment. When bit one of the immediate value 
is clear, bit two is also forced to zero. | 


For flush, bits one and two must be zero. 
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General Format 


31 25 20 15 | 10 
OPCODE/| SRC2 DEST SRC1 | 


16-Bit Bg Se (except bte and btne) 


OPCODE =u SRC2 DEST ‘IMMEDIATE 


st, bla, bte, and btne 
15 


31 25 20 10 
OFFSET SRC1 
. OPCODE/| SRC2 HIGH SRC1S | OFFSET ver 


bte and btne with 5-Bit Immediate _ 


oe 25 20 15 10 | : Oo 
OPCODE i SRC2 Sin IMMEDIATE | OFFSET LOW 


Figure 8.1. REG-Format Variations . 
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Table 8. 4. REG-Format Opcodes 


31 | 26 
Id.x Load Integer 0 0 0 Lo oO | | 
st.x Store Integer 0 QO. FO) 2b eal & 1 
ixfr Integer to F-P Reg Transfer 0 0 0. 0 1 0 
(reserved) 0 0 ) 1 1 0 
fid.x, fst.x Load/Store F-P oO | Oo |} 1 0 LS | 
flush Flush - : o-| oOo |: 1- 1 0 4 
pst.d — Pixel Store 0 0 1 1 | 1 
Id.c, st.c Load/Store Control Register _ 0 | O + fe LS 0 
bri Branch Indirect 0 1 0 0 0 0 
trap Trap 0 | 0 0 0 1 
: (Escape for F-P Unit) | (0 1 0 0 1 0 
(Escape for Core Unit) 0 7 0 0 1 1 
bte, btne ~ Branch Equal or Not Equal 0 1 0 1 E | 
pfid.y — Pipelined F-P Load | ) 4 1.-1--0 0 | 
(CTRL-Format Instructions) 0 1 1 X Xx Xx 
addu, -s, subu, -s, Add/Subtract 24 0 0 SO AS | 
shl, shr ~ Logical Shift 1. 0 foe 20 LR | 
shrd pa: Double Shift eee 4 4 0 4 1 0 0 
bla Branch LCC Set and Add 4 o | 1 4 0 1 
shra Arithmetic Shift 1 Oo | 1 1 1 | 
-and(h) | ~~ AND | ot a 0 O |-H | 
andnot(h) ANDNOT 1 ot 0 1 H | 
or(h) OR : 1 4 1 0 H | 
xor(h) XOR 1 1 1 1 H | 
(reserved) 1 1 4 4 1 0 
L Integer Length | AS Add/Subtract 
0 —8 bits 0 —Add 
1 —16 or 32 bits (selected by bit 0) 1 —Subtract 
LS Load/Store LR . Left/Right 
0 —Load 0 —Left Shift 
1 —Store 1 —Right Shift 
‘SO Signed/Ordinal E Equal 
0 —Ordinal Q —Branch on Not Equal 
1 —Signed | 1 —Branch on Equal 
H ~~ High | Immediate 
0 —and, or, andnot, xor . 0 —src? is register 


1 —andh, orh, andnoth, xorh 1 —=src?7 is immediate 


31 26 15 10 5 0 


*reserved (must be set to zero by assemblers) 


Figure 8.2. Core Escape Instruction Format 
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Table 8.5. Core Escape Opcodes 
4 


(reserved) 

lock Begin Interlocked Sequence 

Calli Indirect Subroutine Call 
(reserved) 

intovr Trap on Integer Overflow 
(reserved) 
(reserved) 

unlock End Interlocked Sequence 
(reserved) 
(reserved) 
(reserved) 


x xx -O- O- 0-0/0 


-“~=-"-O0 000000 0 
-o}-' Ooo oO 0 000 0 


0 
0 
0 
0 
, 
, 
4 
, 
x 
X 
4 


x x KoA +? OO] | OO 


8.2.2 CTRL-FORMAT INSTRUCTIONS 


The CTRL instructions do not refer to registers, so instead of the register fields, they have a 26-bit relative 
branch offset. Figure 8.3 shows the format of these instructions and Table 8.6 defines the encodings. 


31 28 25 | | | 0 
eS ee 
BROFFSET is a signed 26-bit relative branch offset. 


Figure 8.3. CTRL Instruction Format 


Table 8.6. CTRL-Format Opcodes 
28 26 


(reserved) 
(reserved) 
br Branch Direct 


call ‘Call ; 

be(.t) Branch on CC Set 

bne(.t) Branch on CC Clear 
T Taken 


0 —hbc or bne 
1 —bc.t or bne.t 
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8.2.3 FLOATING-POINT INSTRUCTIONS 


The floating-point instructions also constitute an escape series. All these instructions begin with the bit se- 

quence 010010. Figure 8.4 shows the format of the floating point instructions, and Table 8.7 gives the encod- 

ings. Within the dual-operation instructions is a subcode DPC whose values are given in Table 8.8 along with 
the mnemonic that corresponds to each. 


SRC1, SRC2 —Source; one of 32 floating: point registers - 
DEST —Destination register 


(instructions other than fxfr) one of 32 floating: point registers 
(fxfr) one of 32 integer registers 


P Pipelining S Source Precision 
1 —FPipelined instruction mode 1 —Double- -precision source daorande 
0 —Scalar instruction mode Q —Single-precision source operands 
D Dual-Instruction Mode R_ Result Precision 
1 —Dual-instruction mode . | . 1 > —Double-precision result 
0 —Single-instruction mode eo | . QO. —Single-precision result 


Figure 8.4. Floating-Point Instruction Encoding _ 


Table 8.7. Floating-Point Opcodes 
eat 6 


Add and Multiply* 

Multiply with Add* 

Subtract and Multiply* 
. Multiply with Subtract* — 


(p)fmul Multiply 

fmlow Multiply Low 

frep Reciprocal 

frsqr Reciprocal Square Root 
pfmul3.dd 3-Stage Pipelined Multiply 


(p)fadd Add 

(p)fsub Subtract 
(p)fix Fix 

(p)famov Adder Move 
pfgt/pfle** Greater Than 
pfeq Equal 
(p)ftrunc Truncate 


a a a nn in ro a Co Cr Co ca © 
=~ st st or omni i 1OOOO O 


AOO-+~c0;/Oo4-00 


fxfr | Transfer to Integer Register _ 
(p)fiadd Long-Integer Add 
(p)fisub Long-Integer Subtract . 


(p)fzchkl Z-Check Long 
(p)fzchks Z-Check Short 
(p)faddp Add with Pixel Merge 
- (p)faddz * Add with Z Merge 
(p)form OR with MERGE Register 


*pfam and pfsm have P-bit set; pfmam and pfmsm have P-bit clear. 
**pfgt has R bit cleared; pfle has R bit set. 


NOTE: 
All opcodes not shown are reserved. 


aoosnolaunuocodliAaooooooloooond 
CDOOAalAioolouauHanocooool|-coool] 
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The following table shows the opcode mnemonics that generate the various encodings of DPC and explains 


each encoding. 8 
T | K- 
Load |. Load* 


Table 8.8. DPC Encoding 


PFAM PFSM M-Unit M-Unit 
DPC ; : 
Mnemonic Mnemonic op1 op2 


src1 | Mresult ' No No 
Mresult |° No Yes 
A result Yes No 


A result Yes Yes 


M result No No 
M result No Yes > 
A result Yes No 
A result Yes | Yes 
ratis2 A result src2 Yes No 
m12apm mi2asm src2 A result M result No No 
raip2 rais2 A result src src2 No No 
m12ttpa m12ttsa src2 A result Yes No 


iatip2 iatis2 A result src2 Yes No 
mi2tpm mi12tsm src2 M result No No 
_ faip2 iais2 A result src2 No No 
mi12tsa src2 A result No on 
T | K 
Load Load* 
M result No No. 


Mresult | . No Yes 
M result Yes No 
M result Yes Yes 


Mresult | No No 
M result No Yes 
M result No 
M result Yes 


mrmtip2 M result src2 No 
mmi12mpm src2 M result No 
mrm1p2 M result | src2 No 
mmi2ttpm src2 A result No 


mimt1p2 mimtis2 M result src2 
mmi2tpm mmi2tsm src2 ‘M result 
mimip2 mim1s2 M result 

| Intel-Reserved 


*|f K-load is set, KR is loaded when operand-1 of the multiplier is KR; Kl is loaded when operand-1 of the multiplier is KI. 
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8.3 Instruction Timings shown in the table below. Freezes due to multiple 
simultaneous cache misses result in a delay that is 
i860 XR microprocessor instructions take one clock the sum of the delays for processing each miss by 
to execute unless a freeze condition is invoked. itself. Other multiple freeze conditions usually add 
Freeze conditions and their associated delays are only the delay of the longest individual freeze. 


Freeze Condition play 


Instruction-cache miss Number of clocks to read instruction (from ADS 
is clock to first READY # clock) plus time to last 
READY # of block when jump or freeze occurs 
during miss processing plus two clocks if data- 
cache being accessed when instruction-cache 

miss occurs. | 


One plus number of clocks to read data (from 
~ ADS# clock to first READY # clock) minus number 
of instructions executed since load (not counting 
‘instruction that references load destination) 


Reference to destination of Id instruction that 
misses | 


One plus number of clocks until first READY # | 
returned (for 32- or 64-bit read cycles) or until 
' second READY # returned (for 128-bit fid.q read . 
cycles) | | 


fid miss 


call, calli, ixfr, fxfr, Id.c, or st.c and data cache 
load miss processing in progress 


One plus number of clocks until first READY # 
returned (for 64-bit read cycles) or until second 
READY # returned (for 128-bit fld.q read cycles) 


id/st/pfid/fid/fst and data cache load miss 


One plus number of clocks until last READY # 
processing in progress ~~ 


returned | 


Reference to dest of Id, call, calli, fxfr, or Id.c in 


One clock 
the next instruction. (Dest of call and calli is r1.) | 
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Reference to dest of fld/pfld/ixfr in the next two 
instructions 


bc/bnc/bce.t/bnc.t following addu/adds/subu/ 
subs/pfeq/pfie/pfgt 


Fsrc7 of multiplier operation refers to result of 
previous operation 


Floating-point operation or graphics-unit — 


instruction or fst, and scalar operation in progress 
other than frep or frsqr 


Multiplier operation preceded by a gOuvle 
precision multiply 


TLB miss 


pfld when three pfld’s are outstanding 


pfid hits in the data cache 


st, pst or fst miss, Id miss, or flush with modified 
block when store path full (two stores or one 256- 
bit write-back internally waiting for bus plus 
external bus pipeline full) 


Id, fld, pfld, st, pst, or fst when address path full 
(one address internally waiting for bus plus 
externa! bus pipeline full) 


Id/fld following st/fst hit 


Two clocks in the first instruction; one in the 
second instruction 


One clock 
One clock 


If the scalar operation is fadd, fix, fmlow, fmul.ss, 
fmul.sd, ftrunc, or fsub, two minus the number of 
instructions (or dual-mode pairs) already executed 
after the scalar operation. If the scalar operation is 
fmul.dd, three minus the number of instructions 
(or dual-mode pairs) executed after it. Add one if 
either or both of these two situations occur: 

1. There is an overlap between.the result register 
of the previous scalar operation and the source 
of the floating-point operation, and the 
destination precision of the scalar operation is 
different than the source precision of the 
floating-point operation. 

2. The floating- -point operation is pipelined and its 
destination is not f0. 

There is no delay if the result is negative. 


One clock 


Five plus the number of clocks to finish two reads 
plus the number of clocks to set A-bits (if 
necsesey) 


One plus the number of clocks to return data from 
first pfld , 


Two plus the number of clocks to finish all 
outstanding accesses 


One plus the number of clocks until READY # 
active on next 64-bit write cycle or second 
READY # of next 128-bit write cycle. 


Number of clocks until next nonrepeated address 
can be issued (i.e., an address that is not the 2nd— 
Ath cycle of a cache fill, the 2nd-8th cycle of a 
CS8 mode instruction fetch, nor the 2nd cycle of a 
128-bit write) 


One clock 
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Delayed branch not taken 


_ Nondelayed branch taken: 
be, bne 
bte, btne 


Indirect branch bri or call calli 

st.c 

Result of graphics-unit instruction (other than 
fmov.dd) used in next instruction when the next 
instruction is an adder- or multiplier-unit instruction 
Result of graphics-unit instruction used in next 


instruction when the next instruction is a graphics- 
unit instruction 


flush followed by flush 


fst or pst followed by pipelined floating-point 
operation that overwrites the register being stored 


8.4 Instruction Characteristics 


The following table lists some of the characteristics 
of each instruction. The characteristics are: 


e What processing unit executes the instruction. 
The codes for processing units are: 
A Floating-point adder unit 
E Core execution unit 
G Graphics unit 
M Floating-point multiplier unit 


e Whether the instruction is pipelined or not. A P 
indicates that the instruction is pipelined. © 


-e Whether the instruction is a delayed branch in- 
struction. A D marks the delayed branches. 


e Whether the instruction changes the condition 
code CC. A CC marks those instructions that 
change CC. 


© Which faults can 1 be caused by the instruction. 
The codes used for exceptions are: 


IT Instruction Fault 

SE _ Floating-Point Source Exception 

RE Floating-Point Result Exception, including 
overflow, underflow, inexact result 

DAT Data Access Fault 


Note that this is not the same as specifying at 
which instructions faults may be reported. A re- 
sult exception is reported on the subsequent 
floating-point instruction, pst, fst, or sometimes 
fld, pfld, and ixfr. 


One clock 


One clock . 


_ Two clocks 


One clock ° 


~ Two clocks 


One clock 


One clock 


Three clocks minus the number of instructions 
between the two flush instructions. There is no 
delay if the result is negative. 


One clock 


The instruction access fault IAT and the interrupt 
trap IN are not shown in the table because ey 
can occur for any instruction. 


e Performance notes. These comments Suarding 
optimum performance are recommendations 
only. If these recommendations are not followed, 
the i860 XR microprocessor automatically waits 
the necessary number of clocks to satisfy internal 
hardware requirements. The following notes de- — 
fine the numeric codes that appear in the instruc- 
tion table: 


-1. The following instruction should not be a con- 
ditional branch (be, bne, be.t, or bne.t). 


2. The destination should not be a source oper- 
and of the next two instructions. 


3. A load should not directly follow a store that is 
expected to hit in the data cache. 


4.When the prior instruction is scalar, fsrc7 
should not be the same as the fdest of the 
prior operation. 


5. The fdest should not raiorenée the destination 
of the next instruction if that instruction is a 
pipelined floating-point operation. 


6. The destination should not be a source oper- 
and of the next instruction. (For call and calli, 
the destination is r1.) 
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7. When the prior operation is scalar and multipli- 
er op7 is fsrc7, fsrc2 should not be the same 
as the fdest of the prior operation. 


~ 8. When the prior operation is scalar, fsrc7 and 
fsrc2 of the current operation should not be the 
same as fdest of the prior operation. 


- 9. A pfld should not immediately follow a pild. 


Programming restrictions. These indicate combi- 
nations of conditions that must be avoided by 
programmers, assemblers, and compilers. The 
following notes define the alphabetic codes that 
appear in the instruction table: 


a. The sequential instruction following a delayed 
control-transfer instruction may not be another 
control-transfer instruction (except in the case 
of external interrupts), nor a trap instruction, 
nor the target of a control-transfer instruction. 


b. When using a bri to return from a trap handler, 
programmers should take care to prevent traps 
from occurring on that or on the next sequen- 
tial instruction. IM should be zero (interrupts 
‘disabled) when the bri is executed. 


c. If fdoest is not zero, fsrc7 must not be the same 
as fdest. 


d. When fsrc7 goes to the multiplier 097, KR, or 
KI, fsrc7 must not be the same as /fdest. 


e. If fdestis not zero, fsrc7 and fsrc2 must not be 
the same as fdest. 


f. isrc? must not be the same as /src2 for the 
autoincrementing form of this instruction. 


g. ‘src? must not be the same as /src2. 


Core and Floating-Point Instruction Interaction in 
Dual-Instruction Mode 


1. If one of the branch-on-condition instructions 
be or bnc is paired with a floating-point com- 
pare, the branch tests the value of the condi- 
tion code prior to the compare. 
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2. lf an ixfr, fld, or pfid loads the same register 

- aS a source operand in the floating point in- 
struction, the floating-point instruction refer- 
ences the register value before the load up- 
dates it. 


3. An fst or pst that stores a register that is the 
destination register of the companion pipe- 
lined floating-point operation will store the re- 
sult of the companion operation. 


4. When the core instruction sets CC and the 


floating-point instruction is pfgt, pfle, or pfeaq, 
CC is set according to the result of pfgt, pfle, 
or pfeq. 

5. When a trap instruction causes a trap in dual- 
instruction mode, the floating-point instruction 
has neither completed execution nor has up- 
dated the FT bit or any result status bits. This 


is not a problem when the trap is inserted by a : , ‘ 
debugger, because the trap is replaced by the & 


original instruction, and the dual-mode pair is 
reexecuted. However, when the trap is pro- 
grammed, the trap handler must avoid reexe- 

_ cuting the trap by returning to user code at 
the address in fir + 8. In this case, the trap 
handler must emulate the floating-point in- 
struction before returning to the user code. 
Emulation of the instruction must include all 
side-effects (for example, the effect of its 
D-bit, effect on the pipelines, and effect on FT 
and result-status bits), just as if the instruction 
had been executed by the processor in the 
original context. 


6. In dual-instruction mode, when the intovr in- 


struction causes a trap, the floating-point com- 
- panion instruction has completely finished ex- 
ecution before the trap is taken. 
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‘© Programming Restrictions for Dual-Instruction 
Mode © » 7 


1 


. The result of placing a core instruction in the 


low-order 32 bits or a floating-point instruction 
in the high-order 32 bits is not defined (except 
for shrd r0, r0, rO0 which is interpreted as 
fnop). | 


_ A floating-point instruction that has the D-bit 


set must be aligned on a 64-bit boundary (i.e., 
the three least-significant bits of its address 
must be zero). This applies as well to the initial 


32-bit floating-point instruction that triggers 


the transition into dual-instruction mode, but 
does not apply to the following instruction. 


. When the floating-point operation is scalar 


and the core operation is fst or pst, the store 
should not reference the result register of the 
floating-point operation. When the core opera- 


tion is pst, the floating-point instruction can- 
not be (p)fzchks or (p)fezhkl. 


. When the core instruction of a dual-mode pair 


is a control-transfer operation and the previ- 
ous instruction had the D-bit set, the floating- 
point instruction must also have the D-bit set. 
In other words, an exit from dual-instruction 
mode cannot be initiated (first instruction pair 
without D-bit set) when the core instruction is 
a control-transfer instruction. 


. When the core operation is a Id.c or st.c, the 


floating-point operation must be d.fnop. 


. When the floating-point operation is fxfr, the 


core instruction cannot be Id, Id.c, st, st.c, 


- Call ixfr, or any instruction that updates an in- 


teger register (including autoincrement index- 
ing). Furthermore, the core instruction cannot 
be a fid, fst, pst, or pfld that uses as /src7 or 
isrc2 the same register as the /dest of the 
fxfr. Additionally, in dual instruction’: mode, 
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fxfr may not be used ina branch delay slot if 


its destination register is referenced by the 
preceding branch instruction. 


. A bri must not be executed in dual-instruction 


mode if any trap bits are set. 


. When the core operation is be.t or bne.t, the 


floating point operation. cannot be pfeq or 


— pfgt. The floating-point operation in the se- 


quentially following instruction pair cannot be 
pfeq or pfgt, either. 


. A transition to or from dual-instruction mode 


cannot be initiated on the instruction following 
a bri. 


10: An ixfr, fld, or pfld cannot update the desti- 


11. 


nation of the companion floating-point in- 
struction (unless the destination is f0 or f1) 
or of the following pipelined floating-point in- 
struction (regardless of its destination regis- 
ter). No overlap of register destinations is 
permitted; for example, the following instruc- 
tions must not be paired: 


// Illegal case l 7 
d.fmul.ss f9, f10, f5 
fld.d address, f4 
; Overlaps f5 


// Illegal case 2 
d.fmul.ss f0, f0, f3 
fld.q - address, f0 
; Overlaps f3 


// Illegal case 5. 
ad.fmul.ss f9, f10, fll 
fld.l address, f5 
d.pfadd.ss fx, fx, f4 
; Overlaps f5, if last 
Stage result is double- 
precision 


During a locked sequence, a transition to or 
from dual-instruction mode is not permitted. 
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Table 8.9 Instruction Characteristics 


Pipelined? .| Sets Performance Programming 
Delayed? CC? Notes. Restrictions 
CC — 


adds — 
addu 
and 
andh 
andnot 


andnoth. 
bc 

be.t 

bla 

bnc 


famov.r 
fiadd.z 
fisub.z 
fix.p 
fid.y 


frsqr.p 
fst.y 
fsub.p 
ftrunc.p 
fxfr | 


fzchki 
fzchks . 
intovr 
ixfr 
Id.c 


OPFMMMIMMMHDOD(O>SME/SEOSTEM MSE OHD>YS|OODMMinmmmmimmmmm\mmmmm 
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Table 8.9 Instruction Characteristics (Continued) 


Instruction . Pipelined? Sets Fa ults Performance | _ Programming 
tes aa = CC? Notes . Restrictions 


pfaddz: 
pfam.p 
pfamov.r 
pfeq.p 
pfgt.p 
pfiadd.z 
pfisub.z 
pfix.p 
pfid.z. 
pfmam.p 


R° G) 
<= 


pfmsm.p 
pfmul.p 
pfmul3.dd 
pform 
pfsm.p 
pfsub.p — 


> 
= 


cs 
p 
p 
p 
p 
p 
p 
p 
p 
p 
p 
p 
=) 

a 
p 


pftrunc.p 
_ pfzchkl 

pfzchks 

pst.d 

shi 
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DATA SHEET REVISION REVIEW 


The following list represents the key differences be- 
tween version 002 and version 001 of the i860 XR 


Microprocessor Data Sheet. : 4. 


1. Big-endian description in section 2.3 has. been 
expanded. 


2. Bit 17 of the Extended Processor Status . Regis: 


_ ter (EPSR) is the INT bit which reflects the value 5. | 
— section 8.2 entitled “Instruction Format and En- 


on the interrupt pin (INT), as described in sec- 
tion 2.2.4 entitled “EXTENDED PROCESSOR | 
STATUS REGISTER”. This is a documentation — 


_KEN# input pin, as described in section 2.5 en- 


titled ‘““Caching and Cache Flushing” and sec- 
tion 3.1.14 entitled “Cache Enable (KEN #)”. 
This is a documentation update only. 

The NOTE section in section 2.5 entitled “Cach- 
ing and Cache Flushing” has been updated to 
clarify the paging requirement, on changing the 
DTB field in the dirbase register. 
Information on register encoding is added in | 


coding”. This is a documentation update only. 


update only. The following list represents the key differences be- 


3. The cacheability of a page is controlled by 


tween version 003 and version 002 of the i860 XR 


NOR'ing the value of the CD, WT bits and the Microprocessor Data Sheet. 
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Specification Changes: 


1. 


2. 


Specification changes for improved AC perform- | 


ance are in section 7.3. 


HOLD is acknowledged during locked bus cy- 
cles. See section 3.1.8. 


Additional paths have been added to the bus 
state diagram to allow direct transitions from 
states T12 and T11 to state TH. See Figures 4.1 
and 4.10. 


Two new instructions, (p)famov.r, have been 
added. These replace (p)fadd.ds and 
(p)fadd.sd in the assembler pseudo-ops 
(p)fmov.r. These changes are in section 8.1 
and tables 2.7, 8.7, and 8.9. 


Documentation Changes: 


1. 


2. 


14. 


Big and little endian description has been ex- 
panded in sections 2.2.2, 2.3, and Figure 2.8. 


The actions and explanations of the lock, un- 
lock, and st.c dirbase changing the BL bit have 
been updated in sections 2.2.4, 3.1.5, 3.1.8, 
4.3.4, 4.3.5, and 8.1. 


The explanation of the AA and MA bits of the 
fpsr have been expanded in section 2.2.8. 

The explanation of the WT bit of the Page Table 
Entries has been expanded in sections 2.4.4.4 
and 2.5. 

A change concerning the locking of the bus dur- 
ing address translation is explained in sections 
2.4.5 and 2.8.5. | 

A further explanation on when to flush the data 
cache is given in section 2.5. 

The explanation of the floating point multiplier 
pipeline has been expanded in section 2.6.1. 


The explanation of BREQ has been expanded 
in section 3.1.4 and Figure 4.1. 


The explanation of result exceptions has been — 


expanded in sections 2.8 and 3.2. 


. Instruction fetch identification has been clarified 


in section 3.1.6 and table 3.2. 


. Bus cycle diagrams in Figures 4.7, 4.8, and 4.10 


have been clarified/corrected. 


. Precision specification .r has been added to 


section 8.0 and table 8.1. 


. In section 8.4, performance note 9 has been 


added, programming restriction d has been 
changed, and programming restriction f has 
been added. Table 8.9 has been updated to re- 
flect these changes. 

The description of testability has changed in 
sections 3.3. and 3.3.2. RESET and HOLD must 
be asserted by the tester to force the chip out- 
puts to float (tri-state). 
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The following list represents the major differences 
between version 004 and version 003 of the i860 XR 
Microprocessor Data Sheet: 


Section 2.2.4 
Section 2.8.2 
Section 2.8.4 
Section 2.8.7 


Section 3.1.4 
Section 3.1.5 


Section 3.1.6 


Section 3.1.8 


Section 6.0 
Section 7.3 


Section 7.3 


Section 8.0 
Section 8.2.1 


Section 8.3 


The explanation of the WP bit of the 
espr has been expanded. 


More information on the instruction 
trap has been added. 


The instruction access trap has been 
clarified. 


The values of registers after a reset 
trap have been specified. 


BREQ timing has been clarified. 


The calculation of interrupt latency 
has bee corrected. 


The description of the byte-enable J 
signals has been expanded. 7 


The relation between the lock § 
instruction and the LOCK # signal has 
been clarified. The BL bit should no 
longer be changed by writing to the 
dirbase register. 


The thermal specifications have been 
updated. 


The A.C. Characteristics for CLK have 
changed. 


Advance timing information for the 50 
MHz clock rate has been added. 
These timings are subject to change 
without notice. 


The operand naming conventions 
have improved. 


The encoding of the flush instruction 
has been corrected. 


The data-dependent multiplier freeze 
has been eliminated. Other freeze 
conditions have been corrected or 
Clarified. | 


The following list represents the major differences 
between version 005 and version 004 of the i860 XR 
Microprocessor Data Sheet. 


Section 2.2.4 OF bit is writable only in supervisor 


Section 3.1.1 
Section 5.0 
Section 6.0 


Section 6.0 


Section 6.0 
Section 6.0 


mode using ST.C. 
CLK rate has been updated. 
Figure 5.3 has been corrected. 


More information on measuring case 
temperature has been added. 


Figure 6.1’ has been updated to in- 
clude 25 MHz. 


Table 6.1 has been corrected. 


Table 6.2 has been updated to in- 
clude 25 MHz. | | 
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Section 7.2 The D.C.. Characteristics have been Section 7.3 25 MHzA.C. Specifications have been © 
updated to include 25 MHz power sup- : added. 


ply current. ne Section 7.3 Figure 7.1 has been corrected. 
Section 7.3. The A.C. Characteristics for CLK have Section 8.3 The data-dependent multiplier round- 


been changed. ing freeze has been eliminated. 
Section 7.3 50 MHz clock rate has been deleted. Section 8.4 Programming restrictions for dual-in- 


struction mode are added. 
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82495XP CACHE CONTROLLER/ 
82490XP CACHE RAM 


m Two-Way, Set Associative, Secondary MESI Cache Consistency Protocol 
Cache for i860™ XP Microprocessor =| Hardware Cache Snooping 


@ 50 MHz “No Glue” Interface with CPU Maintains Consistency with Primary 
m@ Configurable Cache via Inclusion Principle 


~~ Cache Size 256 or 512 Kbytes Flexible User-Implemented Memory 


— Line Width 32, 64 or 128 Bytes ; 
— Memory Bus Width 64 or 128 Bits rll tu eal Range of 


Dual-Ported Structure Permits — Clocked or Strobed 
Simultaneous Operations on CPU and — Synchronous or Asynchronous 
Memory Buses — Pipelining 

Efficient MRU Way Prediction _ >7Memory Bus Protocol | 
— Zero Wait States on MRU Hit [] 82495XP Cache Controller Available in 
— One Wait State on MRU Miss 208-Lead Ceramic Pin Grid Array 
Dynamically Selectable Update Policies Package | 
— Write-Through © 82490XP Cache RAW Available in 84- 
— Write-Once Lead Plastic Quad Fiatpack Package 
— Write-Back (See Packaging Handbook, Order #240800) 


The Intel 82495XP cache controller and 82490XP cache RAM, when coupled with a user-implemented memo- 
ry bus controller, provide a second-level cache subsystem that eliminates the memory latency and bandwidth 
bottleneck for a wide range of multiprocessor systems based on the i860 XP microprocessor. The CPU 
interface is optimized to serve the i860 XP microprocessor with zero wait states at up to 50 MHz. A secondary 
~ cache built from the 82495XP and 82490XP isolates the CPU from the memory subsystem; the memory can 
run slower and follow a different protocol than the i860 XP microprocessor. 


82495XP 


CACHE 82490XP | 


| 


| 
CONTROLLER 1 7 CACHE RAM 


DEVICES 


240956-60 
Figure 0-1. Secondary Cache Configuration | 


Intel, intgl, and i860 are trademarks of Intel Corporation. 


Intel Corporation assumes no responsibility for the use of any circuitry other than circuitry embodied in an Intel product. No other circuit patent 
licenses are implied. Information contained herein supersedes previously published specifications on these devices from Intel. June 1991 
© INTEL CORPORATION, 1991° 2-243 Order Number: 240956-001 
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1.0 82495XP/82490XP PINOUTS 
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Figure 1-1. 82495XP Pinout (Bottom View) | 
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Figure 1-2. 82495XP Pinout (Top View) 
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Figure 1-4. 82490XP Pinout (Bottom View) 
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1.1 Pin Cross Reference Tables 
Table 1-1. 82495XP Pin Cross Reference by Name 


| Signal Location — Signal Location Signal Locatio 
ADS # B15 AHOLD A17 BGT # | M03 » 


C 


0 D 
0 
CNA # [CFGO] LO CRDY # [SLFTST #] M CWAY OO — -J08 - 
N 
0 
PO 
R 
0 


EADS # J15 FLUSH # [NCPFLD #] FPFLD # [FPFLDEN] J04 
FSIOUT # HITM#[CPUTYP] D17 INVICLEN1] | ~ K15 


3 
DO1 
H 
2 


C 
S 


SNPCYC # HO3 SNPINV 


0 
4 
4 03 
4 
2 
14 
4 02 
4 4 
04 
16 03 
6 
04 7 
02 
14 04 
01 
7 
0 
7 07 
14 08 
14 ; 
8 6 
3 
0 
01 
13 3 
14 2 
2 
10 09 
3 01 
05 


; 
; 
1 
1 
P4 
0 J 
MSET6 R17 MSET7 St MSET8 P10 
MSET9 Q12 MTAGO Q1 MTAG1 ~ Pog 
MTAG1O ~ Qo MTAG11 | P MTAG2 Qog 
1 
0 
1 
1 
1 
F 
Pp 


_ SNPNCA | QO03 
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Table 1-1. 82495XP Pin Cross Reference by Name (Continued) | 


Signal Location Signal Location Signal Location | 


SwEND#ICrGH 00 
TAGS BOS |: TAGS | C07 
TAGS 007 
POS 
P02 


WAY Lis | WBA M14 
WOWTeIWAWAST] Kia 
pwRe + Bi7_ | WAAR ba OP 


NC A14, A15, S01,S02 | Vcc AO05-A08, A10-A13, E01, E17, Vss BO5-B08, B10-B11, B13, E02, 
HO1, H17, KO1, K17, LO1, L17, E16, F02, HO2, H16, J02, J16, 
C09, N17, F17, G01, G17, K02, K04, K16, LO2-L03, L16, 
M17, N01, SO05-S13 C10, N16, G02, G16, R02, RO5- 
R10, M16, N02, R11-R13 


CDATA‘ | 
CDATAS | 51 CDATA6 52 CDATA7 57 
CDATA4 : 46 | CLK © 30 | : 


HITM# - | jhe 41 MBRDY#I[MISTB]  . 
42 MDATAO AB 
14 MDATA2 | 10 MDATA3 | 6 


MDATA4 
MDATA7 — 


+ a a 
NC —_ 83 | Vcc 5,9, 13,17, 29,35,50, | Vss 7,11, 15, 19, 31, 33, 34, 47, 
| | aon 56, 74 Ae | 
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1.2 Quick Pin Reference 


BGT # [C490LDRV] Bus Guaranteed Transfer, [82490XP Low Drive] 
This signal is generated by the MBC to the 82495xXP. It indicates to the 
82495XP a commitment by the MBC to complete the cycle on the memory 
bus. Until BGT # activation the 82495XP owns the cycle and will abort it if 
intervening snoops happen. After BGT # the cycle is owned by the MBC until 
its completion. From BGT # until SWEND# snoops will be accepted, but none 
will be processed until SWEND# activation. 
During RESET'’s falling edge, this signal controls the driver’s strength of the 
82495xXP to 82490XP interface signals. This strength is a function of the 
cache size, and therefore the number of 82490XP’s. Refer to the layout 
specifications section for more details. 


BE Latch Enable 

The BLE # signal is used to control the enable line of an external ’377-type 
latch. The latch captures the i860 XP CPU’s BE (Byte Enable) signals and 
other CPU provided cycle attributes which do not go through the 82495xP. 


82495xXP Burst Ready 

This is the burst ready indication from the memory bus controller. The MBC 
should connect its burst ready indication to the CPU BRDY #, the 82495XP 
BRDY # and the 82490XP BRDY #. In the CPU, it provides the same function 
as that described in the CPU data sheet. The 82495xP will only use this 
indication for burst tracking purposes. In the 82490XP, it increments the CPU 
latch burst counter. 


Cache Address Strobe 


This signal is generated by the 82495xXP and used by the memory bus 
controller. Its assertion requests execution of a memory bus cycle by the 
memory bus controller. This signal when active indicates that the cache cycle 
control and attribute signals are valid. | 


82495XP AHOLD 

This signal is generated by the 82495XP to track the CPU AHOLD signal 
when used for warm-reset and LOCKed sequences. It also provides 
information about CPU and cache BIST. 


- Cache Data/Control 
This is a cycle definition signal driven ey the 82495xP. It indicates the type of 
memory bus cycle requested. This signal is valid with CADS # and can be 
pipelined by the memory bus controller. 


Cache Data Strobe — 

This signal is driven by the 82495XP to the memory bus controller. CDTS # for 
read cycles indicates that in the next CLK the memory bus controller can 
generate the first BRDY # for the read cycle. For write cycles it indicates 
when data is available on the memory bus. Usage of this signal allows 
complete independency between address strobes (CADS #, SNPADS #) and 
data strobe. 


Cache Configuration bits 0-2 - 

These signals are inputs to the 82495XP. CFGO-2 allow the B2495XP to be 
configured to 5 different modes. Different modes indicate 82495XP/CPU line 
ratio, tag size (4K/8K), lines per sector. 


2-249 


intel. _ 82495XP Cache Controller/82490XP Cache RAM PRELIMINARY 


1.2 Quick Pin Reference (Continued) 


Clock , : 

This signal provides the fundamental timing for the 82495xXP, 82490XP and 
CPU. It must be provided to the 82495XP, 82490XPs, CPU and memory bus 
controller components with minimal skew. 


CM/IO# Cache Memory/lO 
This signal is driven by the 82495XP and is a cycle definition signal. It 
indicates the type of memory bus cycle requested. This signal is valid with 
CADS# and can be pipelined by the memory bus controller. 


CNA # [CFG] 82495xXP Next Address Enable, [Configuration Pin 0] | 
| This signal is driven by the memory bus controller and supplied to the 
82495xP. It is used by the memory bus controller to dynamically pipeline 
CADS # cycles. 
During RESET falling edge it functions as the 82495XP CFGO input. 


CRDY#([SLFTST#] |. Cache Memory Bus Ready, [82495XP Self Test] 
This signal is generated by the memory bus controller and informs the 
82495XP and 82490XP that a memory bus cycle has been completed. 
~CRDY # activation ends the memory bus cycle. 
During RESET’s falling edge, if this signal is sampled low(active) and 
MBALE is sampled high(active), 82495XP self test will be invoked. 


‘Cache Way 
CWAY is driven by the 82495XP and is a oycle definition signal that 
indicates to the memory bus controller the WAY to be used by the 
requested cycle. On line-fills it indicates the way the line will be loaded. For 
write-backs it indicates the WAY that was written-back. This signal is valid 
with CADS #. | | | | 


Cache Write/Read 
This signal is driven by the 82495XP and is a 82495XP ace definition 
signal. It indicates the type of memory bus cycle requested. This signal is 
valid with CADS# and can be pipelined by the memory bus controller. 


DRCTM # . - Memory Bus Direct to [M] State. 
This signal is an input to the 82495XP. It is the mechanism by which the | 7 
memory bus can dynamically inform the 82495XP of a request to skip the. 
[E] state and move the line directly to the [M] state. This signal is sampled 
by the 82495XP when SWEND # is asserted. 


| FLUSH#[NCPFLD#] Flush the 82495xXP cache, [Enable Non-Cacheable PFLD] 

: , This signal is an input to the 82495XP. Flush when active will cause the 
82495xP to write-back all of its modified lines into main memory then 
invalidate all tag locations. At the end of a flush operation the 82495xXP tag 
array will be completely invalidated. 

During RESET activation, this pin functions as the NCPFLD# configuration 
signal which, with FPFLDEN, selects one of three modes for handling 
i860 XP CPU floating point load cycles. © 
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1.2 Quick Pin Reference (Continued) 


FPFLD# [FPFLDEN] FIFO PFLD Enable [PFLD Mode Select] 
During RESET, FPFLDEN and NCPFLDEN # inputs select one of three 
modes to handle i860 XP CPU pipelined floating point load cycles. In the 
mode which supports an external FIFO, the FPFLD# output indicates a 
PFLD cycle to be loaded into the FIFO. 


FSIOUT # Flush/Sync/Initialization Output 

This signal is an output of the 82495XP and indicates the start and end of 
| three operations: Flush, Sync, and Initialization. The output is activated 

when the operation internally begins and is de-activated when the 
operation ends. 

KLOCK # | 82495XP LOCK# | 

| This signal is driven by the 82495XP and indicates to the memory bus 
controller a request to execute atomic read-modify-write sequences. 
KLOCK # is active with the CADS # of the first LOCKed operation and 
| remains active until at least the clock following CADS # of the last cycle of 

LOCKed operation. 


KWEND # [CFG2] Cacheability Window End, [Configuration Pin 2] 

This signal is generated by the MBC and indicates to the 82495xXP that the 
Cacheability Window has expired. At this point the 82495xP will latch the 
memory cacheability signal (MKEN #) and make decisions based on the 

- cacheability attrioute. MRO# which indicates the Read-Only cycle attribute 
is also sampled at this point. 
During RESET’s falling edge this line functions as the CFG2 configuration 
signal which is used to configure the 82495XP/82490XP with cache 
parameters. 


MALE[WWOR#]. — -Memory Bus, Address Latch Enable[Weak Write sorennal 

a This signal is generated by the memory bus controller, and controls a 
82495xP internal transparent address latch (373 like). CADS # will 
generate a new address at the input of the internal address latch. MALE 

-activation(high) will allow the flowing of this address to the memory bus 

provided MAOE # is active. When MALE inactive(low), the address at the 
latch input is latched. 
WWOR # configures the 82495XP into strong or weak write-ordering 
mode. 


Memory Bus Address Output Enable | 

This signal is generated by the memory bus controller and controls the 
-82495XP’s output buffer of the memory bus address latches. The 82495XP 
drives the memory bus address lines if MAOE # is active (low). Otherwise, 
it is tristated. MAOE # also serves as a qualifier for snooping cycles: when 
inactive snoops will be enabled. 


MBALE[HIGHZ#] ean Bus, 82495xXP sub-line-address Latch Enable[High impedance 
Output 
This signal has an exact function as MALE but controls only the 82495XP 
sub-line addresses. This signal is generated by the memory bus controller, 
‘and controls a 82495xXP internal transparent address latch (373 like). 
CADS # will generate a new address at the input of the internal address 
latch. MBALE activation(high) will allow the flowing of the sub-line address 
-to the memory bus provided MBAOE # is active. When MALE inactive(low), 
the sub-line address at the latch input is latched. 
HIGHZ #, if active along with SLFTST #, causes the 82495xP to float all of 
its outputs. 
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1.2 Quick Pin Reference (Continued) 


MBAOE # Memory Bus, 82495XP sub-line Address Output Enable 
This signal has a similar function than mnOE but controls only the 
82495xXP sub-line addresses. 7 
If MBAOE # is active(low), the 82495XP will drive the sub-line portion of the 
address onto the memory bus. Otherwise, it is tristated. MBAOE # is also 
sampled during snoop cycles. If MBAOE# is sampled inactive with 
SNPSTB #, the snoop write back cycle(if any) will begin at the sub-line 
address provided. If MBAOE # is active with SN Boe the snoop write 
back will begin at sub-line address 0. 


MBRDY # (MISTB) Memory Bus Ready, (Memory Input Strobe) 
This pin is an input to the 82490xP. It is used in clocked bus mode to. 
indicate the end of a transfer. When active(low) it indicates that the 
82490XP should increment the burst counter and either output the next 
data or get ready to accept the next data. 
In strobed memory bus mode this pin is the input data strobe to the 
82490XP. On each MISTB edge, the 82490XP latches the data and 
increments the burst counter. 


MCACHE # | 82495xP Internal Cacheability 
3 This signal is driven by the 82495XP. On read cycles, this signal indicates 
the cycle’s internal cacheability attribute. In write cycles MCACHE # is only 
active for write-back cycles. MCACHE # is not activated for I/O, special 
cycles and Locked Cycles. 


MCFA6-MCFAO Memory Bus Configurable address lines . 

MSET10-MSETO =| Memory bus SET number | 

MTAG11-MTAGO Memory bus TAG bits | 

» | These are the memory bus address lines of the 82495XP and should be 
connected to the A31-—A2 (A31-A3 for 64 bit bus) signals of the Memory 
Bus. These signals, along with the byte enables, define the physical area of 
memory or I/O accessed. 
_ The 82495XP drive these signals i in normal memory bus cycles and have 

them as inputs during snooping. 


MCLK([MSTBM#] | Memory Bus Clock, [Memory Input Strobe] 
In clocked memory bus mode this pin provides the memory bus clock to the 
82490XP. In clocked mode, memory bus signals and memory bus data are 
sampled on the rising edge of the MCLK. In a clocked memory bus write, 
data is driven off of MCLK or MOCLK depending upon the configuration. 
This pin is an input to the 82490XP. It is sampled during reset and 
determines the memory bus type. If active(low), the memory bus will be 
strobed. If inactive (high), the memory bus will be clocked. 
If a clock is detected at this input, this pin becomes the memory bus sbck 
and clocked memory bus mode is selected. 


MDATAO-MDATA7 Memory Bus Data 

a, 3 , These pins are the 8 memory data pins of the 82490XP. All or part of these 
pins will be used depending on the cache configuration. In clocked memory 
bus mode, these pins are sampled with the rising edge of MCLK. New data 

is driven out on these pins with MEOC # or the rising edge of MCLK or 

MOCLK together with MBRDY #. active. In strobed memory bus mode, 
these pins are sampled on each MISTB edge. New data is driven out on 
‘these pins with each MOSTB edge. — 
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1.2 Quick Pin Reference (Continued) 


Memory Data Output Enable 

This signal is an input to the 82490XP. The memory bus output enable is 
used to control the 82490XP’s driving of data onto the memory bus. When 
this pin is inactive(high), the MDATA[0:7] pins are tristated. When this pin is 
active(low), the MDATA[0:7] pins are actively driving data. The function of 
this pin is the same for strobed or clocked memory bus operation .as 
MDOE # has no relation to CLK or MCLK. 


Memory End of Cycle 

This signal is an input to the 82490XP. Since it is synchronous to the 
memory bus, it may be used to end a cycle on the memory bus and begina 
pending cycle without waiting for synchronization to the CPU CLK. MEOC # 
also causes the latching or driving of data and resetting of the memory burst 
counter. 


MFRZ# [MEMLDRV] Memory Freeze, [Memory Bus Low Drive] 
This signal is an input to the 82490XP. It is used for write cycles that could 
cause allocation cycles. When this pin is active(low), write data is latched in 
the 82490XP. The subsequent allocation will not overwrite data latched by 
the write. This prevents the actual write to memory from having to be 
performed on the memory bus. The allocated line will be placed in the [M] 
state in the cache since memory has not been updated. | | 
During RESET’s falling edge, this signal is sampled to indicate the 
82490XP’s memory bus driving strength. The 82490XP provides normal and 
high drive capability buffers. 


MHITM # | Memory Bus Hit to Modified Line 


This signal is driven by the 82495xXP during snoop cycles and indicates 
‘whether the snooping address hit a Modified line in the 82495XP cache. The 
82495XP automatically schedules the writing-back of modified lines when 
snoop hits occur. MHITM # is activated the CLK after SNPCYC# and will 
remain active until the next SNPSTB #. 


Memory Bus Cacheability. 

This signal is an input to the 82495XP. It is the memory bus cache enable 
pin. It is used to indicate to the 82495xP if the current memory bus cycle is 
cacheable or not. This pin is sampled by the 82495XP with KWEND# 
assertion. 


MOCLK(MOSTB) Memory Output Clock, (Memory Output Strobe) 

: MOCLK controls a transparent latch at the 82490XP data outputs. By 
providing a clock input, skewed from MCLK, MDATA hold time may be 
increased. | 
In strobed bus mode this pin is the data output strobe. On each MOSTB 
edge, new data will be output onto the memory bus. | 


Memory Bus Read-Only | 

This pin is an input to the 82495XP. Iti is the READ-ONLY attribute pin. It is 
used to indicate to the 82495xP that the accessed line should get a READ- 
ONLY attribute. READ-ONLY lines will be non-cacheable in the first level 
cache. READ-ONLY lines will be cached in the 82495XP if MKEN# is ~ 
sampled active during KWEND# and will be cached in the [S] state. This pin 
is sampled by the 82495XP with KWEND # assertion. 
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1.2 Quick Pin Reference (Continued) 


MSEL # [MTR4/TR8#] 


MTHIT # 


MWB/WT # 


MZBT # [MX4/MX8 #] 


PALLC # 


RDYSRC 


_Memory Select, [Memory Transfer] 2 
This signal is a chip select input to the 82490XP. MSEL# activation 
qualifies the MBRDY # input of the 82490XP. MSEL# going active causes 
the sampling of MZBT # for the next cycle. MSEL# going inactive resets 
the 82490XP’s internal memory burst counter. \ 
This pin is used to determine the number of transfers necessary on the 
memory bus for each cache line. If high, there are 4 transfers on the — 


memory bus for each cache line. If low, there are 8 transfers on the 
memory bus for each cache line. 


Memory Bus Tag Hit 


This signal is driven by the 82495XP during snoop cycles. It indicates 
whether the snooping address hit any line (exclusive, shared, or modified) © 
in the 82495XP cache. MTHIT # is activated the CLK after SNPCYC # and 
will remain active until the next SNPSTB#. 


Memory Bus Write Policy 

This signal is an input to the 82495xP. It is the mechanism by which the 
memory bus can dynamically inform the 82495xXP of the cycle write policy 
(Write-Through/Write-Back). This signal is sampled by the 82495XP with 
SWEND # activation. 


_ Memory Zero Based Transfer, [Memory Te Bits] 


This signal is an input to the 82490XP. When this pin is sampled active 
(with MSEL# or MEOC #) it indicates that the memory bus cycle should 
start with burst location zero independent of the sub-line address 
requested by the CPU. 


_ This pin is used to determine the number of IO pins used for the memory 


bus. When HIGH it indicates that 4 1O pins are used per 82490XP. When 
LOW it indicates that 8 IO pins are used. , 


Next Near — | 

This signal is generated by the 82495xP and indicates to the memory bus . 
controller if the address of the requested memory cycle is “near” the 
address of the previously generated one (in the same 2K DRAM page). 
This information can be used by the memory bus controller to optimize 
access to paged or static column DRAMs. This signal is valid together with 
CADS#. 


Potential Allocate. 

This signal is generated by the 82495XP and indicates to the memory bus 
controller that the current write cycle can potentially allocate a cache line. 
Potential allocate cycles are cycles which are 82495XP misses with PCD, 
PWT inactive. 


Ready Source 

This signal is an output of the 82495XP. It indicates the source of the - 
BRDY generation for the CPU. When high it indicates that the memory bus 
controller should generate BRDYs to the CPU, when low it indicates that 
the 82495xXP will be the one providing BRDYs. 
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1.2 Quick Pin Reference (Continued) 


Reset 

This signal forces the 82495XP and 82490XP to adahn execution at a known state. It’s 
falling edge will sample the state of the configuration pins. RESET is an asynchronous 
input to the 82495XP and 82490XP. 

The following 82495XP pins are sampled during reset falling edge: 

CNA# [CFGO]: CFGO line of 82495xXP configuration inputs. 

SWEND # [CFG1]: CFG1 line of 82495XP configuration inputs. 

KWEND # [CFG2]: CFG2 line of 82495XP configuration inputs. 

FLUSH # [NCPFLD #]: Enables decoding of the non-cacheable PFLD mode. Active if low. 
FPFLD # [FPFLDEN]: Enables the external FIFO pfld mode. Active high. 

BGT # [C490LDRV]: Indicates the driving strength of the 82495XP/82490 xP interface. If 
high, the 82495XP can drive up to 10 82490XP’s without derating. If low, the 82495XP 
can drive up to 18 82490XP’s without derating. 

SYNC # [MEMLDRVI]: Indicates the 82495XP’s memory bus driving strength. 
SNPCLK[SNPMD)}: Indicates the snoop mode, synchronous or asynchronous. 
CFGO-CFG2 signals are used to configure the 82495XP/82490xP with cache 
parameters. They define the lines/sector, line ratio, and number of tags. 
MALE[WWOR #]: Enforces strong or weak write-ordering consistency. 
MBALE[HIGHZ #]: If active along with SLFTST # will tristate all 82495XP outputs. 

The following 82490XP pins are sampled during reset falling edge: 

PAR #: If active(low), this pin configures the 82490XP as a parity storage device. The 
parity configuration stores the paritybits belonging to data stored in other 82490XP’s. 
MZBT # [MX4/MX8 #]: Determines the number of IO pins used for the memory bus 
interface. If high, four 1O pins are chosen. If low, eight |O pins are chosen. 

MSEL# [MT4/MT8 #]: Determines the number of transfers necessary on the memory bus 
for each cache line. If high, four memory bus transfers are needed to fill a cache line. If 
low, eight memory bus tranfers are needed to fill a cache line. 

MCLK[MSTBM #]: If active(low), this pin indicates a strobed men ony bus configuration. If 
inactive(high), a clocked memory bus is chosen. 

MFRZ# [MEMLDRV]: Indicates the 82490XP’s memory bus driving strength. 


Same Cache Line 

This signal is an output of the 82495XP. It is used to indicate to the memory bus controller 
that the current cycle is to the same 82495xXP line as the previous one. This indication 
can be used by the memory bus controller to selectively activate its SNPSTB# signal to 
other caches. For example, back to back snoop hits to the same line may be snooped 
only once. This signal is valid together with CADS #. 
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1.2 Quick Pin Reference (Continued) 


SNPADS # Cache Snoop Address Strobe | 
This signal is an output of the 82495 xP. It riasé an identical functionality as 
CADS #, but is generated only on snooping-write-back cycles. Considering that 
snoop write-back cycles are the only ones which are generated independent of 
CPU bus activity, this separate address strobe should ease implementation of 
the memory bus controller. Whenever active, the memory bus controller should 
abort all pending cycles (cycles for which BGT # was not issued yet. After 
BGT # the memory bus controller is responsible for the cycle completion). The 
82495XP assumes that non-committed cycles are aborted upon SNPADS # 
and may re-issue them again after the completion of the snoop. 


SNPBSY# - Snoop Busy 
| This signal is driven by the 82495XP. When inactive(high), it adieates that the 
82495XP is ready to accept another snoop cycle. SNPBSY # will be activated 
for one of two reasons: A snoop hit to a modified line, a back-invalidation is 
needed when there is one already in progress. In either of these cases, the 
82495xXP will not perform the look-up for a penne snoop until SNPBSY # is 


de-activated. 


SNPCLKISNPMD] Snoop Clock [Snoop Mode] | | | 
eo This pin provides the 82495xXP with the snoop clock to be used in clocked 
memory interfaces. During clocked mode SNPSTB#, SNPINV, SNPNCA, 
. MBAOE #, MAOE #, and the Address lines will be sampled by SNPCLK. 
During RESET activation, this pin functions as the SNPMD (snoop mode) 
signal. If high it indicates strobed snooping mode. If low it indicates 
synchronous snooping mode. For clocked snooping mode, SNPCLK is 
connected to the snoop clock source. 


SNPCYC# | Snoop Cycle — 
: This signal is an output of the 82495xP. It indicates when the snooping look- -up 
is actually taking place in the 82495XP tag RAM. 


SNPINV : Snoop Invalidation si. | | 
| This signal is an input to the 82495XP and indicates the résultirie line state in 
case of a snoop hit cycle. If active, it forces ine une, to go to an invalid state. 
This signal is sampled with SNPSTB#. 


SNPNCA | - Snoop Non Caching Device Access 
| 2 | -This signal is an input to the 82495XP and provides the 82495xP information 
on whether the current memory bus master is a. non caching device (DMA, 
etc). This indication allows the 82495XP to avoid changing line states from 
exclusive to shared unnecessarily. 


SNPSTB # Snoop Strobe 

, This signal is an input to the 82495XP which is used to initiate a snoop. 
SNPSTB# causes the latching of the snoop address and parameters. The 
82495XP supports three latching modes: Clocked, Strobed, Synchronous. In 
the clocked mode, address and attribute signals will be latched with the 
activation of SNPSTB#.SNPCLK. In the strobed mode, address and attributes 
will be latched by the SNPSTB # falling edge. In synchronous mode, address 
and attribute signals will be latched with the activation of SNPSTB #.CLK. 
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1.2 Quick Pin Reference (Continued) 


SWEND# [CFG1] Snoop Window End, [Configuration Pin 1] 

. This signal is generated by the MBC and indicates to the 82495xP that the 
Snoop Window has expired. At this point the 82495xXP will latch the memory 
bus attributes: write policy (MWB/WT #), and direct to [M] transfer 
(DRCTM #), At the end of the snooping window, all other devices have 
snooped the bus master’s address and have generated address caching 
attributes on the bus. Once a cycle begins, the 82495XP prevents snooping 
until it has received SWEND#. The 82495xXP will act based on those 
attributes and will update its tag RAM. 

During RESET'’s falling edge this line functions as the CFG1 configuration 
signal which is used to configure the 82495XP/82490XP with cache 
parameters. —_— 


SYNC # [MEMLDRV] Synchronize 82495xXP cache, [Memory Bus Low Drive] 

oi This signal is an input to the 82495xP. Activation of this line will cause the 
synchronization of the 82495xXP tag array with main memory. All 82495XP 
modified lines will be written back to main memory. The difference between 
FLUSH and SYNC is that on SYNC the 82495XP and CPU tag array will NOT 
be invalidated. All the valid entries will be kept, with all modified mnee: 
(M state) becoming non-modified (E state). . 
During RESET’s falling edge, this signal is sampled to indicate the memory 
bus driving strength. If it is sampled low, the maximum capacitive load 
without derating is 100pf. If itis sampled high, the maximum capacitive load 
without derating is 50pf. : 


Testability Clock | 

This signal is an input to both the 82495XP and 82490XP. This is the 
boundary scan clock. This signal has to be connected to a clock 
synchronous to CLK to insure initialization of the test logic. 


Testability serial input 
This signal is an input to both the 82495XP and 82490XP. 


Testability serial output — 
This signal is an output of both the 82495XP and 82490XP. 


Testability Control 
This signal is an input to both the 82495XP and 82490XP. — 


The following pins have internal pull-ups: During tri-state output testing sequence, all pull- es 
will be disabled. 

ADS#, NA#, FPFLD#, TDI, TMS, BGT#, 

KWEND#, SWEND#, CNA#, BRDY#, SYNC#, The following signals are glitch free. These signals 

FLUSH#, SNPSTB#, MRO#, DRCTM#, TCK, are always at a valid logic level following RESET: 

SNPCLK, MFRZ#, MZBT #, MCLK, MOCLK. a 

CADS#, CDTS#, SNPADS#, SNPCYC#. 
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1.3  QOutput Pins 


Table 1-3 lists all output pins, from which part(s) they are driven, and their active levels. 


a [Name [Part | Active bevel 
race ——SSS~«*d eae [tow | wTni® [eawoexe fuow | 
rcanse———~*dseaasx [tow | NENE# | eavosxp [Low 
rcaHoud ——~S~*d ene [ HIGH | Pauice | eaaesxe ‘tow 
Low 
mae 


| Table 1-3. Output Pins 


Active Level 


CDTS# | 82495xXP | LOW RDYSRC_ |. 82495xP HIGH 
CWAY | 82495XP SMLN# 82495XP 


Low 
rMcacHee | ezaasxp [Low | T00___| e2vasxpreasaoxw |. 
mare —-eeaexe[iow | i 


1.4 Input Pins 


W/R#,CD/C#,CM/IO# | 82495xP|- ——|. SNPADS# | 82495xP }LOW 


‘ 


Table 1-4 lists all input pins, which part(s) they are input to, their active level, and whether they are synchro- 
nous or asynchronous inputs. | i oa | | , 
| Table 1-4. Input Pins. 7 


Active Level Synchronous/Asynchronous i 
BGT #[C490LDRV] 82495XP | Synchronous to CLK | | 


Paros | eaessxevaaasoxe | Low | Synchronous o GLK 
[ax eaesxerezeoowe [- 
Fe 
enae (Gra) | eavesxe (| Low | Synchronousio GLK 
enpveisurtstel | easssxprazasoxe | Low | SynchronousioGLK 
[orcrwe | eawsexp i tow «id Note? 
Tepurve | eavesxe =i Low | SyncronousioGLK 
PkwenDe orca) | eavsexe (| Low | SynchronousioGLK 
TWAOE®.MBAGE® | eaasexe | Low | Asynchronous 
Pwccaistewe! | e2eooxP | Low | Synetronouos io MOLK 

ai 

row 


82490XP Synchronous/Asynchronous, Note 1 


MEOC # 


2-258 © 


intel. | 82495XP Cache Controller/82490XP Cache RAM PRELIMINARY 


Table 1-4. Input Pins (Continued) 


| Part Active Level Synchronous/Asynchronous 
82490XP Synchronous/Asynchronous, Note 1 
82490XP 


ce 
iN 

PwseuiTRa/TRee) | eeasoxe | Low | Synchwonous/Aynchronovs, Note 1 
Cwzeremxerxee] | ee4e0xP | tow | Synchronous/ Asynchronous, Noto 


MKEN # 82495XP | Low 
MWB/WT# 82495XP Le 


Fe 


Low 
Low 
Low 
LOW 
LOW 
Low 
SNPCLK[SNPMD] 82495XP 
LOW 
SYNC#[MEMLDRV]_|_82495XP__ 


TCK 82495XP/82490XP i 
82495XP/82490XP _ Synchronous to TCK |. 
82495XP/82490XP 


Synchronous to TCK 


NOTES: | i 

(1) In Clocked memory bus mode these pins are synchronous with MCLK. In Strobed memory bus mode these pins are 
asynchronous. mS . or ee 
(2) MWB/WT#, DRCTM# must be synchronous to CLK during SWEND#. MKEN#,, MRO# must be synchronous to CLK 
during KWEND#. | 

(3) In clocked memory bus mode these pins are synchronous with SNPCLK. In strobed memory mode these pins are 
asynchronous. 


1.5 Input/Output Pins 


Table 1-5 lists all input/output pins, which part they interface with, and when they are floated. | 


Table 1-5. Input/Output Pins | | a: 


Synch/Asynch When Floated 
FPFLD#[FPFLDEN] | 82495xP | SynchronoustoCLK | - = | —s«Y 
MCFA0-MCFA6 82495XP MAQE# = High 


| MDATAO-MDATA7 | 82490XP_| Note2 = ——«|--« MDOE# = Hight and during Reset 
MSETO-MSET10 82495XP MAOE# = High 
MTAGO-MTAG11 82495XP MAOE# = High 


- NOTES: | | 

(1) With MALE high and MAOE# low, these pins are synchronous to CLK. 

(2) In Clocked memory bus mode these pins are synchronous with MCLK. In Strobed memory bus mode these pins are 
asynchronous. 
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1.6 Pin State During Reset | 
| Table 1-6. Pin State During Reset 
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A 


NOTES: | | - 


(1)MSET, MTAG, and MCFA signals are high impedance during reset if MAOE# and MBAOE# are deasserted. 

(2) The state of CAHOLD depends on whether self-test is selected (see testability chapter for details). Po ee + 

(3) The State of TDO is controlled by the boundary scan which is independent of other signals including RESET (see 
testability chapter for details). : 


CAHOLD 


2.0 CHIPSET INTRODUCTION 


The 82495XP/82490XP ‘is'a second-level cache 


controller chipset for the i860 XP CPU. The chipset 
provides a unified code and data cache which is 
software transparent. The 82495XP/82490XP has 
been designed to support a high-speed CPU/cache 


core interface, and a same or lower speed memory 


_ bus interface. 


2.1 Main Features 
The 82495XP/82490XP have the following main 


features: 
— Tracks the speed of the i860 XP CPU 
— Large Cache Size support: | 

4K or 8K Tags 


1 or 2 lines per sector | 


The 82495XP is the cache controller. It contains 8K 4 or 8 transactions per line 


tags and control logic to control up to a 512K size 64 or 128-bit wide memory bus 
cache. The 82490XP is a custom cache data RAM 256K or 512K cache 
designed to be used with the 82495XP. Between 8 ; ms ; i : ; 
and 18 82490XPs are required to create a 256K to | — Write-Back cache with full multiprocessing con- 
512K cache, respectively..The memory bus control- sistency support: 
ler (MBC) is the set of logic required to interface the supports the MESI protocol 
82495XP and 82490XP to the memory bus. The watches memory bus to guarantee 1st level, 2nd 
MBC provides product differentiation, and its imple- level cache consistency | 
mentation ultimately determines system perform- BY ccuce e.g 

maintains inclusion 


ance. i ie 
| — Two-way set-associative with MRU hit prediction 
algorithm . 
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— Zero wait state hit cycles on MRU hit. One wait 
state on MRU misses 


— Concurrent CPU and Memory Bus transactions 


— Supports synchronous, asynchronous, and 
strobed memory bus architectures 


2.2 CPU/Cache Core Description 


Figure 2-1 depicts a block diagram of the basic 
cache subsystem. The cache subsystem provides a 
gateway between the CPU and the memory bus. All 
CPU accesses which can be serviced locally by the 
cache subsystem will be filtered out from the memo- 
ry bus traffic. Therefore local cycles (CPU cycles 
which hit the cache and do not require a memory 
bus cycle) will be completely invisible to the memory 
bus providing the reduction in memory bus band- 
width necessary for multiprocessing systems. Anoth- 
er very important function of the 82495XP cache 
subsystem is to provide speed decoupling between 
the CPU and memory busses. Processors are quick- 
ly achieving operating frequencies which can be 
very difficult for the memory subsystem to meet. The 
82495XP cache subsystem is optimized to serve the 
CPU with zero wait-states up to very high frequen- 
cies (50 Mhz), at the same time providing the decou- 
pling necessary to run slower memory bus cycles. 


The Basic Functions of the cache subsystem ele- 
ments are: : 


82495XP: Main control element, includes the tags 
and line states and provides hit or miss decisions. It 


A To/From CPU 
ky. 

# Processor f 

Bus 

¥ Interface | 

To/From 
Memory 
Bus 


Pt wy 
# Controller j 
) Interface f 


| Addr Latch | 
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cpu | 


Memory | | 82495 
Bus s Cache 
| Controller | 7 Controller § 


Figure 2-1. 82495XP Cache Subsystem 


handles the CPU bus requests completely and coor- 
dinates with the memory bus controller when an ac- 
cess needs the memory bus. It controls the 
82490XP data paths for both hits/misses to provide 
the CPU with the correct data. It dynamically adds 
wait states based on the MRU prediction mecha- 
nism. The 82495XP is also responsible for perform- 
ing memory bus snoop operations while other devic- _ 
es are using the memory bus. The 82495XP drives 
the cycle address and other attributes during a 
memory bus access. A block diagram of the 
82495xXP is shown in Figure 2-2. | 


Directory 
Array 


| 82490 | To 82490 
| Interface j-——» 


| (Tag RAM) | 


f Snoop Latch 


VY To/From Memory Bus 


240956-6 


Figure 2-2. 82495XP Block Diagram 
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82490XP: Implements the cache SRAM storage and 


data path. It includes latches, muxes, logic which 


allow it to work in lock-step with the 82495XP to 


efficiently serve both hit and miss accesses. It takes _ 


full advantage of internal silicon flexibility to provide 
a degree of performance otherwise unachievable 
with discrete implementations. It supports zero wait 
state hit accesses, concurrent CPU and memory bus 
accesses, and includes a replication of the MRU bits 
for autonomous way prediction. During memory bus 
cycles it acts as a gateway between CPU and mem- 


ory buses. A block diagram of the 82490XP:is shown 


in Figure 2-3. | 


Control 
From CPU 


Control 


From 82495 CPU Bus © 
Control — 


Address Latch 


.§ Memory Bus f_ 
Control 


= Control 
From MBC 


Figure 2-3. 82490XP Block Diagram 
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Niemory Bus Controller: Server for memory bus cy- 
cles. It adapts the CPU/Cache core to a specific 
memory bus protocol. It coordinates with the 
82495xXP line fills, flushes, write-backs, etc. The 
memory bus controller’s flexibility allows customers 
to easily adapt the 82495XP cache subsystem to 
their specific architectures, and to provide their own | 
differentiation. Figure 2-4 shows an example memo- 
ry bus controller. The: MBC handles all cycle control, 
data transferring, snooping, and any synchroniza- 
tion. | | | 


DATA 
To/From CPU 


> CPU Bus Mux/Buffer | 


Write ; ae 
I Back Snoop | Memory | Memory 
Buffer | Buffer 0] Buffer 1 
| Buffer 5 


Memory Bus Mux/Buffer 
y DATA — 


To/From Memory Bus 
~"  240956-7 


"OPTIONAL a ie 
SYNCHRONIZERS ' 


DATA 
CONTROL J 


"f= CONTROL 


“= CYCLE i. 


¥ To/From Address 
& Control Bus 


To/From Data 


Y To/From Snoop Bus 
‘& Control Bus 


240956-8 


Figure 2-4. MBC Example Block Diagram 
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3.0 CACHE OVERVIEW 


This chapter gives a brief description of 82495XP/ 
82490XP configurations, interface, snooping mecha- 
nism, cycle control mechanism, and memory bus 
control mechanism. Each section of this overview is 
described in more detail in later chapters. 


3.1 Configuration 


The 82495XP/82490XP cache chipset offers a num- 
ber of configuration options. The system designer 
can choose from a number of different operating 
characteristics, including memory bus_ modes, 
snooping modes, and internal physical attributes 
(line size, lines per sector, etc.). The flexibility of 
these configuration options allow the 82495XP/ 
82490XP cache to be used in a wide range of appli- 
cations. 


Configurations are selected by altering the 
~ 82495XP/82490XP inputs during RESET. They are 
not dynamically changeable, and to conserve pins 
some configuration inputs become 82495XP or 
82490XP inputs/outputs after RESET. 


3.1.1 PHYSICAL CACHE 


Physically, the 82495XP/82490XP can be config- 
ured to support many different cache configurations. 
By selecting one cache configuration, other configu- 
rations may be excluded. The 82495XP/82490XP 
can be configured to support: ge B 


— 256K or 512K cache 
— 64 or 128 bit wide memory bus 
— One or two lines per sector 


MEM BUS = 64 Bits 


7 


Not Supported LR = 82495XP/CPU Line Ratio 
| L/S = 82495XP Lines/Sector 
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— 1:1, 1:2, or 1:4 CPU to 82495xXP line size ratio 
— 4or8 memory bus transactions per line 

— 4K or 8K tag size 

— Strong or weak write ordering 


Figure 3-1 summarizes the basic configurations 
available when using the 82495XP/82490 xP. 


3.1.2. SNOOP MODES 


When another master snoops the 82495XP, the 
MBC must initiate the snoop request and pass on: 
the response. The 82495XP allows the MBC to initi- 
ate this snoop request in one of three modes: syn- 
chronous, clocked, and strobed. The snoop re- 
sponse of the 82495XP is always synchronous. 


When initiating the snoop in synchronous snoop {| 
mode, all snoop information is latched by the 
82495XP synchronous to the CPU CLK. The snoop 
is then performed on the next CLK edge and the 
response given on the CLK edge after that. This is 
the fastest possible method of snooping. 


In clocked snooping mode, information is latched by 
the 82495XP with respect to an external snoop 
clock (slower than CLK) source. The 82495XP must 
internally synchronize this information to CLK and 
provide a response. 


In strobed snooping mode, information is latched 
into the 82495XP with respect to the falling edge of 
another signal. Thus, the snoop initiation is clock in- 
dependent. The 82495XP again synchronizes this in- 
formation with CLK. 


MEM BUS = 128 Bits Number of 
4 Trans 82490XP Devices | 


Cache Device 
2,4,8Bits Wide — 


Figure 3-1. 82495XP/82490XP Configurations 
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3.1:3 MEMORY BUS MODES 


The 82490XP may be configured to be in one of two 
memory bus modes. This mode determines how 
data will be passed on to and. off of the data bus. 
The two modes are clocked mode and strobed 
mode. These modes need not have any relation to 
the snoop mode chosen. 


In clocked mode, data is driven from an external 


memory clock source called MCLK, or read with re- 
spect to MCLK. MCLK is completely independent of 
the CPU CLK source. There are inherent perform- 
ance advantages, however, in, making this clock 
source synchronous or half-clock (divided) synchro- 


~ nous to the CPU CLK. 


In strobed mode, data is driven from the rising edge 
of one signal, and read with the rising edge of anoth- 
er. Like the strobed snooping mode, this carries no 
clock skew problems, or memory bus speed limita- 
tions. es 


3.2 CPU Bus Interface 


The CPU bus interface is the connection of the 
82495XP ‘and 82490XP to the i860 XP CPU. Be- 
cause this interface is optimized to achieve the high 
speed performance, it is not a flexible interface. The 
majority of the signals in the CPU bus interface must 
, be connected strictly between the 82495XP/ 
82490XP cache and the i860 XP CPU. Chapter 10 
addresses the use of such signals. 


Some CPU signals are, however, accessible by the 
MBC. These are the following pins: RESET, CLK, 
BRDY2#, INT, BERR, PCHK#, PEN#, TCK, TDI, 
TMS, TRST #, and TDO. CPU pins KBO, KB1, HIT#, 
and BREQ are also available to the MBC, but are of 
limited use in an 82495XP/82490XP system. 


Other CPU pins flow through a ’377 type latch to the - 


MBC. The latch enable is controlled by the 82495XP 
through the BLE# pin. The following CPU signals 


flow through this latch: PCD, PWT, BEO#-BE7#, 


CACHE #, LEN, PCYC, and CTYP. 


3.3 82495XP/82490XP Interface 


The 82495XP/82490xP interface is the connection 
between 82495XP and 82490XP. Like the CPU bus 


‘interface, this isolated interface is not flexible and =~ 


‘may not be altered beyond what Intel has provided. 
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3.4 Memory Bus and Memory Bus 
Controller Interface 


The. memory bus controller (MBC) is the interface 
logic required to control the 82495XP/82490XP and 
connect it to the memory bus and rest of the system. 
The MBC may be simple enough to support a single- 
CPU write-through cache, or complex enough to 
support a multiprocessing cache with external tags. 
The 82495XP/82490XP is a very flexible chipset, 
and the MBC determines exactly how the 
82495XP/82490XP will work i ina System 


An MBC consists of a few basic blocks: . a snoop 
logic block, a cycle control block (with synchronizers 
if necessary), and data path coritrol block. The 
snoop block must be able to communicate with the 
other caches when snooping is necessary. At the 


~ same time, the cycle control block must interface to 


some arbitration logic for bus arbitration. 


3.4.1 SNOOPING LOGIC. 


The MBC snooping logic is responsible for initiating 
a snoop in the 82495XP and providing the response 
to the rest of the system. Snoop logic must recog- 
nize what other caches are doing, and snoop if nec- 
essary. Snoop logic must also recognize when its 
82495xXP is not capable of eouNe and hea its 

snoop initiation. | 


When | a eye begins on the bus, all other sathies 


“snoop. Once all the snoop results are returned to 


the master 82495xXP, its snoop logic must recognize 
the result and alter the cycle appropriately. This 
could mean aborting the current cycle in memory, 
delaying the cycle until a write-back is performed, or 
changing the master’s tag state according to the 


~ snoop Information. 


Suoe CYCLE CONTROL LOGIC — 


Cycle control logic is responsible for initiating ae 
memory bus cycle, providing. proper 82495xXP cycle 


_attributes during the cycle, and terminating the cy- 


cle. Cycle control logic determines the cacheability 


_of the cycle, whether cycles are allocatable, pipelin- 


ing, and all aspects of the pploaress of the current 
cycle. 


Since cycle control logic interfaces memory bus sig- 
nals to the 82495XP, and since the memory bus is 
not necessarily synchronous to the 82495XP CLK, it 
may also provide proper synchronization. Careful 
design of this synchronization logic can minimize or 


- eliminate synchronization penalties. 
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3.4.3 DATA PATH CONTROL 


Data path control logic controls how data is written 
from the 82490XP or read into the 82490XP and 
CPU. It handles the actual transferring of data to/ 
from the memory data bus. Data path control logic 
also handles the CPU burst order, and the holding of 
data during allocation cycles. In systems with memo- 
ry busses that are wider than the CPU bus, the data 
path control logic appropriately steers data to the 
correct 82490XP’s. 


3.5 Test 


The 82495XP/82490XP provide two means of 
cache testing. These are a built-in self-test, and 
boundary scan test. The built-in self-test (BIST) is 
initiated during RESET. The boundary scan test 
uses separate and dedicated pins on the 82495xP. 
These are described in a later chapter. 


4.0 CACHE CONSISTENCY 
PROTOCOL 


One of the 82495XP objectives is to implement a 
high performance second level cache for multipro- 
cessor systems. To fulfill this objective the 82495XP 
implements a ‘“write-back” cache with full support 
for multiprocessing data consistency. Being a write- 
back cache means that the 82495XP may contain 
data which is not updated in the main memory. 
Therefore a mechanism is implemented to insure 
that data read by any system bus master, at any 
time, is correct. 


A key feature for multiprocessing systems is reduc- 
tion of the memory bus utilization. The memory bus 
quickly becomes a resource bottleneck with the ad- 
dition of multiple processors. The 82495XP cache 
consistency mechanism insures minimal usage of 
memory bus bandwidth. 


The 82495XP allows portions of memory to be de- 
fined as non-cacheable. For the cacheable areas, 
the 82495XP allows selected portions to be defined 
as write-through locations. | 


The 82495xXP protocol is implemented by assigning 
state bits for each cached line. Those states are de- 
pendent on both 82495XP data transfer activities 
performed as the bus master, and snooping activi- 
ties performed in response to snoop requests gener- 
ated by other memory bus masters: 
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4.1 Cache Consistency Protocol 
Model 


The 82495xXP consistency protocol is the set of rules 
which allows the 82495xXP to contain data that is not 
updated in main memory while ensuring that memo- 
ry accesses by other devices do not receive stale 
data. This consistency is accomplished by assigning 
a special consistency state to every cached entry 
(line) in the 82495xXP. . 


NOTE: 
The following rules apply to memory read and write 
cycles. All I/O and special cycles bypass the 


-cache. 


The 82495xXP protocol consists of 4 states. They de- 
fine whether a line is valid (hit or miss), if it is avail- 
able in other caches (shared or exclusive), and if it is 
modified (has been modified). 


The 4 States are: 


[I] - INVALID Indicates that the line is not avail- 
able in the cache. A read to this 
line will be a miss and cause the 
82495XP to execute a lino fill 
(fetch the wholo line and deposit 
it into the cacho SRAM). A write 
to this line will cause the 
82495XP to execute a write- 
through cycle to the memory bus 

~ and in some circumstances initi- 
‘ate an ALLOCATION. 


This state indicates that this line 
is potentially shared with other 
caches (The same line may exist 
in more than one cache). A 
Shared line can be read out of the 
cache SRAM without a main 
memory access. Writing to a 

_ Shared line updates _ the 
82495XP/82490XP cache, but 
also requires the 82495XP to 
generate a write-through cycle to 
the memory bus. In addition to 
updating main memory, the write- 
through cycle will invalidate this 
line in other caches. Since writing 
to a Shared line causes a write- 
through cycle, the system can en- 
force a “write-through policy” to 
selected addresses by forcing 
those addresses into the [S] 
state. This can be done by setting 
the PWT attribute in the CPU 
page table or asserting the 
MWB/WT # pin each time the ad- 
dress is referenced. 


[S] - SHARED 


intel. | 


[E] - EXCLUSIVE This state indicates a line whichis 
exclusively available in ONLY this 
cache, and that this line is NOT 
MODIFIED (main memory also 
has a valid copy). Writing to an 
Exlusive line causes it to change 
to the Modified state and can be 
done without informing other 
caches, sO no memory bus activi- 
ty is generated. 


[M]- MODIFIED This state indicates a line which is 
exclusively available in ONLY this 
cache, and is MODIFIED (main 
memory’s copy is stale). A 
Modified line can be updated lo- 
cally in the cache without acquir- 

' . ing the memory bus. Because a 
Modified line is the only up-to- 
date copy of data, it is the 
82495XP’s responsibility to flush 
this data to memory on accesses 
to it. Flushing of this data to mem- 
ory will be executed immediately 

- after completion of the current 
CPU bus cycle. 


4.2 Basic State Transitions 


This section covers the most common, basic memo- 
ry accesses. The special functions which force a cy- 
cle to be noncacheable, locked, read only, or direct- 
to-Modified are not in use. These might be used, for 
example, in read for ownership and cache to cache 
transfers, and are covered in section 4.3. This basic 
transitions section is divided into two parts: the first 
covers MESI state changes which occur in a CPU/ 
cache core due to its own actions; the second de- 
scribes MESI state transitions in a CPU/cache core 
caused by the actions of other, external devices. 
Figure 4-1 shows a partial state diagram of the MESI 
coherency protocol which includes these basic tran- 
sitions. 


The 82495XP accepts line attributes from the CPU 
and memory buses. The 82495XP assumes that all 


caches on the memory bus have the SAME number — 


of bytes per line. 


4.2.1 ‘TRANSITIONS IN CACHE STATES 
CAUSED BY OWN CPU TRANSACTIONS 


The MES! state of each 82495XP/82490XP cache 
line changes as the 82495XP/82490XP services the 
read and write requests generated by its CPU. 
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4.2.1.1 Read Hit 


A read hit occurs when the CPU generates a read 
cycle on its bus, and the data is present in and re- 
turned by the 82495XP/82490XP. The state of the 
cache line (M, E, or S) remains unchanged by a read 
operation which hits the cache. | 


4.2.1.2 Read Miss 


A read miss arises when the CPU generates a read, 
and the data is not’ present in_ the 
82495XP/82490XP cache—either the tag lookup 
does not produce a match or a match occurs but the 
data is Invalid. The 82495XP generates a memory 
access to fetch the data (which is assumed cache- 
able for. this discussion) and the surrounding data 
needed to fill the cache line. This data is placed in 
the 82495XP/82490XP cache in an invalid line or (if 
both valid) replaces the least recently used line, 
which is written back to memory if Modified. | 


The new line is placed in the Exclusive state, unless 
either the CPU or memory indicates that it should be 
a write-through on its next write access using PWT 
or MWB/WT #, respectively. If either of these is as- 
serted, the new line is placed in Shared state. A new 
line could also be read in and placed directly into 
Modified state: see section 4.3.4 for details and use. 


4. 2. 1 3 Write Hit 


When the CPU generates a write cycle, if the data is 
present in the 82495XP/82490XP cache, it is updat- 
ed and may undergo a MESI state change. 


‘If the hit line is originally in the Exclusive state, it 


changes to Niodified state upon a write. If the hit line 
is originally in the Modified state, it remains in that 
state. Neither of these cases generates any bus ac- 
tivity. | 


A write to a line which is in the Shared state causes 
the 82495xXP to write the data out to memory as well 
as update the 82495XP/82490XP cache. The write 
to main memory also serves to invalidate any copy 
of the data which resides in another cache. The 
cache line state changes according to activity on the 
PWT and MWB/WT # pins. If neither. of these pins is 
asserted, the write hit line becomes Exclusive. If ei- 
ther of these pins is asserted, the line is forced to 
remain write-through, so the state remains Shared. 
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An existing line can also be written and forced di- 
rectly into Modified state: see section 4.3.4 for de- 
tails and use. 


4.2.1.4 Write Miss 


The CPU generates a write cycle, and the data is not 
present in the 82495XP/82490XP cache. In a simple 
write miss, the 82495XP/82490XP assists CPU in 
delivering data to memory, but the data is not placed 


in the cache. No cache lines are affected, so no. 


state changes take place. 


4.2.1.5 Write Miss with Allocate 


This is a special case of a write miss where the 
memory location written by the CPU is not currently 
in the 82495XP/82490XP cache, but is brought into 
the cache and updated. Like a regular write miss, 
the 82495XP/82490XP assists the CPU in writing 
the data out to main memory. After the data is writ- 
ten to memory, the 82495XP/82490XP reads back 
the same data following the rules of a read miss, 
above. 


The ability to perform an allocation depends on all of 
the following conditions: 


the write is cacheable 

PWT is not asserted, forcing write- through 
_ the write is not LOCKed 
_the write is to memory (not to |/O) 


4.2.2 TRANSITIONS CAUSED BY. OTHER 
| DEVICES ON BUS 


MESI state transitions in the 82495XP/82490XP 
cache of one core (CPU/82495XP/82490XP) can 
be induced by actions initiated by other cores or de- 
vices on the shared memory bus. In the following, 
the 82495XP which is responding to actions of other 
devices does not currently own the bus, and may be 
referred to as a “‘slave”’ or, in the case of snooping, 
a ‘‘snooper’. The device which currently owns the 
bus is the ‘master’. 
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4.2.2.1 Snooping 


The master which is accessing, data from memory 
on the bus sends a request to all caching devices on 
the bus (snoopers) that they check or snoop their 


- caches for a more recently updated version of the 


data being accessed. If one of the snoopers has a 
copy of the requested data, it is termed a “snoop 
hit’. , 


If a snooper has a modified version of the data 
(“snoop hit to a Modified line’’), it proceeds to gener- 
ate an “inquire cycle” to the i860 XP CPU, asking 
the i860 XP CPU if it also has a Modified copy of the 
line (which would be more recently modified than the 
82495XP/82490XP’s version). The most up-to-date 
line is written out by the ; 
82495XP/82490XP to the bus (to main memory or (& 
directly to the requesting master) so that the re- 
questing master can utilize it. 


The changes in MESI protocol state in a snooping 
cache which has a snoop hit depend on attribute 
inputs SNPINV and SNPNCA, which are driven by 
the master. 


The  SNPINV input tells a snooping 
82495XP/82490XP to invalidate the lino boing 
snooped if hit: the master requesting the snoop is 
about to write to its copy of this line and will there- 
fore have the most up-to-date copy. When SNPINV 


_is asserted on the snoop request, any snoop hit is 


placed in Invalid state, and a “back invalidation” is 
generated which instructs the CPU to check its 
cache and likewise invalidate a copy of the line. . 
When the snooping 82495XP has a snoop hit to a 
Wlodified line and SNPINV was asserted by the bus 
master, the back invalidate is combined with the in- 
quire cycle. 


The .SNPNCA input’ tells a = snooping 
82495XP/82490XP whether the requesting master 
is performing a Non-Caching Access. If the request- 
ing master is not caching the data, a snoop hit to a 
Modified or Exclusive line can be placed in the 
Exclusive state: since the requester isn’t caching the 


snooping 


ined, 
ifitel. 


line, if the snooper has a future write hit to the line, 
an invalidation does not have to be broadcast. If the 
requesting master is caching the data, then a snoop 
hit to a Modified or Exclusive line must be placed in 
the Shared state, which insures that a future write hit 
causes an invalidation to other caches. Note that a 
snoop hit to a Shared line must remain in the Shared 


state regardless of SNPNCA. Also note that an as- : 


serted SNPINV always overrides SNPNCA. 


4.2.2.2 Cache Synchronization | | 


Cache synchronization is performed to bring the 
main memory up-to-date with respect to the 
82495XP/82490XP. Two. devices exist in the 
82495XP/82490XP to accomplish this: FLUSH and 
SYNC. , 


A cache flush is initiated by asserting the 82495XP 
FLUSH # pin. Once initiated, the 82495XP writes all 
Modified lines out to main memory, performing back 
invalidations and inquire cycles on the CPU. When 
completed, all 82495XP/82490XP and CPU cache 
entries will be in the Invalid state. 


Activation of the SYNC# pin also causes all of the 
82495XP’s Modified lines to be written to memory. 
Unlike the FLUSH # pin, the cache lines remain valid 
after the SYNCH# process has completed, with 
Modified lines changing to the Exclusive state. 
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4.3 The Effects of Special Cycles on 
MESI States 


4.3.1 NON-CACHEABLE ACCESS 


~The 82495xXP allows cacheability to be determined 


on both a per page and per line basis. The page 


cacheability function is determined by software, 


while cacheability on a line-by-line basis is driven by 
hardware. 


The PCD (Page Caching Disabled) pin is a 82495XP 
input driven by the CPU’s PCD output, which corre- 
sponds to a cacheability bit in the page table entry of 


amemory location’s virtual address. If the PCD bit is 


asserted when the CPU presents a memory ad- 
dress, that location will not be cached in either the 
82495xP or the CPU. 


MKEN # is a 82495xXP input which connects to the 
memory bus controller or the memory bus. MKEN# 
inactive prevents the caching of the memory loca- 
tion in both the 82495xP and the CPU, affecting only 
the current access. 


If aread miss is indicated non-cacheable by either of 
these, the line is not .placed' in_ the 
82495XP/82490XP or CPU cache, and no cache 


states are modified. On a write miss, a noncachea- 
ble indication from either input forces a write miss 
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Figure 4-1. Major State Transitions 
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without allocation. Note that if the 82495XP/ 
82490XP already has a valid copy of the line, the 
PCD attribute from the CPU is ignored. 


4.3.2 READ ONLY ACCESSES: MRO # 


The MRO# (Memory Read Only) input is driven by 
the memory bus to indicate that a memory location 
is read only. 


When asserted during a read miss line fill, MRO# 
causes the line to be placed in_ the 
82495XP/82490XP cache in the Shared state and 
also sets a read-only bit in the cache tag. MRO# 
accesses are not cached in the CPU. On subse- 
quent write hits to a read-only line, the write is actu- 
ally written through to memory without updating the 


82495XP/82490XP line, which remains in the 


Shared state with the read-only bit set. 


4.3.3 LOCKED ACCESSES: LOCI<# 


The LOCK# signal driven by the CPU indicates that 
the requested cycle should lock the memory loca- 
tion for an atomic memory access. Because locked 


cycles are used for interprocessor and intertask syn- 


chronization, all locked cycles will appear on the 
memory bus. 


On a locked write, the 82495XP treats the access as 
a write-through cycle, sending the data to the memo- 


ry bus—updating memory and invalidating other 


cached copies. If the data is also present in the 
82495XP/82490XP cache, it is updated but its M, E, 
or S state remains unchanged. 


For locked reads, the 82495XP assumes a cache 
miss and starts a memory read cycle. If the data 
‘resides in the 82495XP/82490xP, the M-E-S state 
of the data remains unchanged. If the requested 
data is in the 82495XP/82490XP and is in the 
Modified state when the memory bus returns data, 
the 82495xXP will use the 82490XP data and ignore 
the memory bus data. 


LOCKed read and write cycles which miss the 
82495XP/82490XP cache are noncacheable in both 
the 82495XP/82490XP and CPU. 
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4.3.4 FORCING LINES ei ated MODIFIED: 
DRCTM # 


The DRCTM# (Direct To Modified) pin " an input 
which informs the 82495XP to skip the Exclusive 
state and place a.line directly in the Modified state. 
The signal can be asserted during 
82495XP/82490XP reads of the memory for special 
82495XP/82490XP data accesses like read-for- 
ownership and cache-to-cache-transfer. The signal 
can also be asserted during writes, for purposes of 
cache tacding: 


4.4 State Tables 


Lines cached by the 82495XP can change states as 
a result of either the CPU bus activity (that some- 
times require the 82495XP to become a memory bus 
master) or as a result of memory bus activity gener- 
ated by other system masters (Snooping). 


State transitions are affected by the type of CPU/ 
memory bus transactions (reads, writes) and by a 
set of external input signals and internally generated 
variables. In addition, the 82495XP will drive certain 
CPU/memory bus signals as a result of the consist- 
ency protocol. | 


4.4.1 CPU BUS — 


— PWT (Page Write Through, PWT Input pin) Indi- 
cates a CPU bus write-through request. Activat- 
ed by the i860 XP CPU PWT pin. This signal af- 
fects line fills and will cause a line to be put in the 
[S] state if active. The 82495XP will NOT exe- 
cute ALLOCATIONS (line fills triggered by a 
write) for write-through lines. If PWT is asserted, 
it overrides a write-back indication on the 
MWB/WT # pin. 


— PCD (Page Cacheability Disable, PCD ap pin): 
indicates that the accessed line is noncachea- 
ble. If PCD is asserted, it overrides a cacheable 
indication from an asserted MKEN #. 


— NWT (i860 XP CPU Write-Through Indication, 
82495XP’s WB/WT# Output Pin): When low 
forces the i860 XP CPU to keep the accessed 
line into the SHARED state. 
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-Write back:-mode, (WB=1) will be indicated by 
the /NWT notation. In those cases the i860 XP 
CPU is allowed to go into exclusive states [E], 
«{M]. NWT is nonmeny active unless et stat- 
- ed. ee 


— ‘KEN (CPU caching enable, KEN# siliaut pin): 
When. active indicates that the requested line 
can be cached by the CPU ‘st level cache. KEN = 
is normally active unless explicitly stated. 
4.4.2 MEMORY BUS 
— MWT (Memory Bus Write-Through Indication, 
MWB/WT # Input Pin): When active forces the 
82495XP to keep the accessed line into the 
SHARED state. Write back mode (MWB = 1) will 
be indicated by the /MWT notation. In those cas- 


es the 82495xXP is allowed to go into exclusive 
states [E], [M]. 


— DRCTM (Memory Bus Direct To IM] indication, 
_ DRCTM# Input Pin): When active forces skip- 
~~ ping of the [E] state and direct transfer to [M]. 


— MKEN (Memory Bus ‘Cacheability Enable, 
MKEN# Input pin): When Active Indicates that 
the memory. bus cycle is cacheable. a | 

— MRO (Memory Bus . Read-Only _ Indication, 
MRO # Input Pin): When Active forces line to be 
READ-ONLY. | _ 

— MTHIT (Tag Hit, MTHIT# Output pin): Activated: 

_ by the 82495XP during snoop cycles and indi- 
- Cates that the current snooped address hits the — 
~ 82495XP cache. | 


—- MHITM (Hit to a line in the [M] State, MHITM# 


~ Qutput pin): Activated by the 82495XP during — 


snoop cycles and indicates that the current 
snooped address hits a modified line in the 
82495XP cache. | 
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As a function of State Changes the 82495XP 
may execute the following cycles: | 


BINV: Execution of a CPU Back Invalidation Cy- 
cle (Snoop with INV active) 

INQR: Execution of a i860 XP CPU Inquire 
Cycle(), 

WBCK: 82495XP Write-Back Cycle. This is a 
Memory. Bus write cycle generated by the 
82495XP when MODIFIED data cached in the 
82495XP needs to be copied back into main 


: memory. A write-back cycle affects a complete 


82495XP line. 


WTHR: 82495XP Write Through Cycle. This is a _ 
system write cycle in response to a processor 
write. It may or may not affect the cache SRAM 
(update). In a write-through cycle, the 82495XP 


drives the Memory Bus with the same Address, 


Data and Control signals as the CPU does on the 
CPU Bus. Main Memory is updated, and other 
Caches invalidate their copies. 


RTHR: 82495XP Read Through cycle. This is a 
special cycle to support locked reads to lines 
that hit the 82495XP cache. The 82495XP will 


_ request a Memory Bus cycle for lock synchroni- 
- zation reasons, data will be supplied from the 


BUS except for [M] state which will have data 


supplied from the-CACHE. 


LFIL: 82495XP Cache line fill. 82495XP will gen- 
erate Memory Bus cycles to fetch a new line and 
deposit into the cache. 


ANAM: 82495XP Read Normal Cycle: This is a 
normal read cycle which will be executed by the 
82495XP for non-cacheable accesses. 

SRUP: 82495XP SRAM UPDATE. Occurs any 
time new information is placed in the 82495XP 
cache. An SRAM update is implied in the LF/L 
cycle. 


= SNPNCA (Non Caching device access): When — ALLOC: 82495XP ALLOCATION. Write Miss cy- 


active indicates to the 82495xXP that the current 
a bus master is a non-caching device. 
— SNPINV (invalidation): When active indicates to 
the 82495XP that the current snoop cycle will 


cle that has determined to be cacheable so the 


82495xXP issues a line read. 


NOTE: 


invalidate that address. 1. An inquire cycle may be executed with INV ac- 
| | tive, performing a back-invalidation simultaneously. 


4.43 TAG STATE ow 
— TRO (Tag Read Only, 82495XP Tag bit): This bit 


when set indicates that the 1 or 2 lines associat- 
ed with this tag are Read-Only lines. 
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STATE TABLES) 
Table 4-1. Master 82495XP Read Cycle 


Condition: Next State 
Activity | Activity 


ILOCK: M | | — | INWT | Normal Read Hit [M] 


LOCK: M RTHR Read Through Cycle, Data From 
Array 


LOCK: E | = — [NWT | Normal Read Hit [E] 


RTHR- | !KEN Read Through Cycle, Data From 
Memory 


| — | NWT Normal Read Hit [S] : a 
IKEN Normal Read to Read-Only 
sector. Stays in [S] state and 
deactivate KEN to prevent CPU 


from caching line 


LOCK: S | | | RTHR IKEN Read Through Cycle, Data from ; 
| Memory — 


PCD-++ IMKEN+LOCK:1 Lyouel arsed Non-Cacheable Read, Locked 
3 cycles 


IPCD.MKEN.!LOCK.MRO: S a * LFIL IKEN Cacheable read, Read-Only. Fill 
; line to 82495xXP. Do not allow 
_ | CPU to cache line by deactivating 
| . KEN #. Set the 82495XP’s TRO 


bit to indicate the sector read only 
attribute 


IPCD.MKEN.!LOCK.!MRO.(PWT + MWT):S LFIL NWT Cacheable Reads, forced Write- 
oo | | Through 
-IPCD.MKEN.!LOCK.IMRO.!PWT.IMWT.IDRCTM: E | LFIL NWT Line not shared, thus enabling the 
_ | 82495XP to move into tan 
| . exclusive state | 
IPCD.MKEN.!ILOCK.!IMRO.!PWT.IMWT.DRCTM: M | LFIL NWT As before with direct [M] state 
| . 4 transfer. Keep i860 XP CPU in 
a Write Through mode 
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Table 4-2. Master 82495XP Write Cycle 


Write hit. Write to cache. Allow i860. XP 
CPU to perform internal write cycles 
(Enter intor [E], [M] states). 


Locked Cycle. Write-Through updating 
cache SRAM. Most updated copy. of the 
line is still owned by 82495xP. All: 


WTHR Read-Only. Write cycle with write. 
through attribute from CPU or Memory 
! Bus. Locked Cycles. — | | 
ITRO.(PWT + MWT + LOCK): S | SRUP, Not Read-Only. Write cycle with write 
| ech | NWT. | through attribute from CPU or Memory _ |. 


Bus. Locked Cycles. 


ITRO.!PWT.ILOCK.!IMWT.IDRCTM: E SRUP, | Not Read-Only. No write-through cycle, 
y ce | : : | NWT no lock request allow going into | 


exclusive state. 


ITRO.!PWT.ILOCK.IMWT.DRCTM: M ~ SRUP, Not Read-Only. No write-through cycle, 
a | NWT no lock request allow going into 
| exclusive state. DRCTM forces final 
state to M. 


‘Write Miss Non-Cacheable, Write- 
Through,.locked.cycle or Read-Only. 


IPCD.MKEN.IPWT.ILOCK.IMRO:1 “Write Mis with allocation. After the write 
| _ | cycle, a line fill (allocation) is scheduled. 
IPCD.MKEN.IPWT.ILOCK.MRO:S | lf MKEN and MRO are asserted, an 


P 


allocation to the [S] state will occur 
Allocation final state as a function of 


Allocation Final State 
line fill attributes. 


MWT:S. 
IMWT.!IDRCTM:E 
IMWT.DRCTM:M 


NOTE: 

The WB/WT# pin will only be activated for 82495XP lines that are in the [M] state. In this state, the 82495XP always 
assumes that the line owner MAY be the i860 XP CPU. On all other states the i860 XP CPU will be forced to perform Write- 
Through cycles. This mechanism will make sure that any i860 XP CPU write cycle is seen at least once on the CPU Bus. 
Allocations, which are consequences of write-misses, will disregard the MKEN# and MRO# attributes during the line fill. In 
other words, once an allocation is scheduled, it cannot be cancelled. . | 
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Table 4-3. Snooping 82495XP without Invalidation Request 


Mem CPU 
Bus Bus 
Activity | Activity 
ISNPNCA: S Snoop hit to modified line. 82495XP indicates tag hit and 


SNPNCA: E modified hit. 82495XP schedules flushing of the modified 
line to memory. If non-cacheable device, stay in [E] state. 


Condition: 
Next State 


ISNPNCA: S If snooping by cacheable device, indicate MTHIT and go | 
SNPNCA: E to shared state. If no caching device only indicate MTHIT, 
stay exclusive. : 


NOTE: , 
Usage of DRCTM# to avoid [E] states may be in conflict with the SNPCNA cycle attribute. Note in the table that snoops 
with SNPNCA may cause an [E] state transition. 


Table 4-4. Snooping 82495XP with Invalidation Request 


CPU 
Bus 
_ Activity 


INQR, 
BINV 


Mem 
Bus 
Activity 


Snoop hit to modified line. 82495xP indicates tag hit and 


Pres. 
State Next State 
modified hit. 82495XP schedules flushing of the modified 


. ae line to memory. Invalidate CPU. 
/E | t | MTHIT | BINV | Inidicate tag hit, infalidate 82495XP, CPU lines. 


| Table 4-5. SYNC Cycles 


Be eames 


Bus 
Get modified data from i860 XP CPU, flush to memory 


Activity 
a | Memory already synchronized 


Mem 
Bus 
Activity 


rie 
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CPU 
Bus 


Pres | Mem 
State Next State | Bus 
Activity | Activity 
WBCK “INQR, - 
BINV 
ee ewer oa BINV | 


NOTE: 


Flush and invalidate i860T™ XP CPU . 


| BINV | Invalidate i860 XP CPU 
Invalidate i860 XP CPU 
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_ Table 4-6. FLUSH Cycles 


Comments 


Usage of DRCTM # to avoid [E] states may be in conflict with the SYNC cycle. Note in the table that SYNC cycles move an 


[M] state line to [E]. 


5.0 CONFIGURATIONS 


The 82495XP/82490XP cache system was de- 
signed to fit a variety of applications. For the great- 
est performance, each application requires the 
82495XP/82490XP to be configured differently. The 
82495XP/82490XP therefore ‘has many possible 
configurations that are set on RESET and affect the 
82495XP/82490XP architecture, ppeanel and 
electrical characteristics. | ; 


5.1 Physical Cache 


The _ physical eoncurationy. of the 82495XP/_ 


82490XP consist of parameters that alter the 
82495XP/82490XP ‘basic architecture. These are 


MEM ict = 64 Bits MEM BUS = 128 Bits 


_line ratio, tag size, lines per sector, bus width, and 


cache size. These parameters are sampled at the 
falling edge of nESE! and are not dynamically 


: changeable. 


Because of seal cache constraints, choosing 
one parameter limits the flexibility of other parame- 
ters. The following table summarizes the possible 


_ i860 XP CPU. basic cache configurations. CFGO- 


CFG2 are multiplexed to select one of 5 possible 


| : line ratio/tag size/lines per sector configurations. 
This information is automatically passed from the 
-82495XP to 82490XP during RESET. CFGO-CFG3 


must be valid at least 10 clocks before RESET's fall- 


— ing edge. 


Number of 


aassuny Devices 


Not Supported 


LR = 82495XP/CPU Line Ratio 


Cache Device 
2, 4, 8 Bits Wide 


L/S = 82495xXP Lines/Sector 


Figure 5-1. 82495XP/82490XP Configurations 
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Table 5-1. CFG Configuration Inputs 


5.1.1 LINE RATIO (LR) 


Line Ratio (LR) is the ratio of the 82495XP/82490XP 
cache line size to the CPU cache line size. For ex- 
ample, if LR=2 then the 82495XP/82490XP line 
size is 64 bytes. This information is also used to de- 
termine the number of back invalidations or inquire 
cycles to the i860 :XP CPU. 


5.1.2 TAG SIZE _— 


The 82495XP/82490XP cache tag size may be 4K 
or 8K tag entries. By reducing tag size, the line ratio 
(LR) can be doubled without a change in cache size. 


5.1.3 LINES PER SECTOR (L/S) 


The 82495XP/82490XP may be non-sectored (L/S 
= 1) or contain two lines per sector (L/S = 2). If 
L/S = 2, then the 82495xXP contains one tag for two 
consecutive cache lines and each cache line has its 
‘own set of MESI state bits. This allows just one line 
to be filled on replacements or written back on 
snoop hits. Both lines are written back during re- 
placements, if both are modified. 


5.1.4 BUS SIZE 

The 82495XP/82490XP supports 64 and 128 bit 
memory bus widths for the i860 XP CPU. 

5.1.5 CACHE SIZE 


The 82495XP/82490XP may be configured to be 
256K or 512K. Cache size is a direct result of the 
number of 82490XP devices used. It takes 8 


- 82490XP’s to make a 256K byte cache and 16 


82490XP’s for a 512K cache. 


5.1.6 FUNCTION AND ADDRESS 
CONNECTIONS (CFA0-CFA6) 


| The following table lists which address lines should 


be connected to each of the CFAQ-CFAS6 lines for 
each cache configuration. CFAO-CFA6 provide the 
82495XP with proper multiplexed. addresses for 
each of the possible cache configurations. Depend- 
ing on the mode selected, either CFA5 or CFA4 will 
operate as the 82495XP’s CTYP input. This input is 
connected to the i860 XP CPU’s CTYP output. 
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Table 5-2. CFA Address Connections 


[ee [tm [om [om [ om on [ow 
GTP. 

se oe a 
ee 
ee 
a 


5.2 Cache Modes 


Cache modes are ways of configuring the 
82495XP/82490XP to operate differently. These op- 
tions are all sampled at RESET and are not dynami- 
cally changeable. If some of these configuration op- 
tions share a pin, such as the 82495XP’s SYNC # 
and MEMLDRY, the configuration option must meet 
a specific setup and hold time to RESET’s falling 
edge. For the 82495XP, setup time is usually 4 


clocks, and for the 82490XP, setup time is usually 1 | 


clock. For both parts, the configuration option must 
be held until RESET is detected low. 


Config 
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Figure 5-2. Configuration Input Sampling 


5.2.1 MEMORY BUS MODES 


The 82495XP/82490XP may be configured to have 
a clocked or strobed memory bus. Memory bus 
mode is selected by the 82490XP MSTBM pin (same 
as MCLK pin). If MSTBM is strapped high, the 
82490XP’s operate in strobed mode. If MSTBM is 
toggling, ie it is connected to the memory bus clock, 
the 82490XP operates in clocked mode. MCLK need 
not be synchronous to CLK. 


5.2.2 SNOOPING MODES 


The 82495XP/82490XP supports three snooping 
modes: synchronous, clocked, and strobed. Snoop- 
ing mode is selected by the SNPMD (same as 
SNPCLK) pin. If SNPMD is low the 82495XP snoops 
synchronously. If SNPMD is high the 82495XP 


snoops in strobed mode. If SNPMD is toggling, 


clocked mode is selected and SNPMD becomes a 
snoop clock source, SNPCLK, which clocks in the 
ae requests. 
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These three snooping modes only alter the way the 
memory bus controller may initiate a snoop request 
to the 82495XP. The 82495XP response is always 
synchronous to the CPU CLK. 


5.2.3 BUS DRIVERS 


The 82495XP/82490xXP provide 2 types of memory 
bus drivers: High capacitance drivers and low capac- 
itance drivers. The high capacitance drivers are se- 
lected by driving both the 82495XP and 82490XP 
MEMLDRYV pins low at RESET. Similarly, the low ca- 
pacitance drivers are selected with MEMLDRV high. 


With C490LDRV the 82495XP also provides two 
types of drivers when driving the 82490XP’s. Refer 
to the interface document to determine C490LDRV. 


5.2.4 STRONG/WEAK WRITE ORDERING 


If the 82495XP pin WWOR# is sampled low at 
RESET, the 82495XP enforces weak write-ordering. 
lf sampled high; the 82495XP enforces strong write- 
ordering. Strong  write-ordering prevents. the 
82495XP from completing a write cycle that would 
go to ’M’ state if a posted write is pending (has not 
been granted the bus with BGT #). By doing this, 
strong ordering ensures that write cycles from the 
CPU are written to memory in the same order that 
they appear in the i860 XP CPU program. 


5.2.5 i860™ XP CPU PFLD SUPPORT | 


The i860 XP. microprocessor executes PFLD (Pipe- 
lined Floating-Point Load) instructions to implement 
special data handling, typically for vector operations. 
This instruction allows loading of data through a 
FIFO pipeline, to hide memory latency. The i860 XP 
CPU does not cache data returned by a PFLD cycle. 


The 82495XP can be configured to decode the 
i860 XP microprocessors PFLD cycles. The 


82495XP supports 3 operational modes for PFLD 


cycle decoding: 
Mode #1. PFLD cycles are e cached i in the 82495XP. 


This mode is used in applications that 
can fit entirely in the 82495XP/82490XP 


cache. The 82495XP treats PFLD cycles - 


as normal read cycles. 
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_ Mode #2. PFLD cycles are not cached in the 
82495XP, without an external PFLD ex- 
tension FIFO. 


This mode is used when applications are 
too large to fit in the 82495XP/82490XP 
cache. The 82495XP treats PFLD cycles 
as noncacheable, using the same proto- 
col as cycles with PCD=1 (if data is al- 
ready cached, it will be supplied from the 
cache). 


Mode #3. PFLD cycles not cached in the 82495xP, 
with an external PFLD extension FIFO. 
This mode allows the PFLD FIFO to be 
extended beyond the three stages built 
into the i860 XP CPU by adding external 


FIFO hardware. The 82495XP, treats ms 


PFLD cycles in the same manner as its 
treatment of LOCKed cycles (all cycles 


go to the bus, even if data already pres- {iia 


ent in cache). To support the external 
FIFO, the 82495xXP identifies PFLD cy- 
cles by asserting its FPFLD output. For 
proper operation, data which can be ac- 
cessed by PFLD must never be in the 
cache in the Modified state, and software 
must be aware of the length of the com- 
bined PFLD pipeline. Because this mode 
is not software transparent, it must be 
used with extreme care. 


The choice of PFLD mode is largely application de- 


pendent. The PFLD mode of the 82495XP is select- 


ed by configuration pins FPFLDEN and NCPFLD#, 


which are sampled at RESET. FPFLDEN shares a 
pin with FPFLD, and NCPFLD# shares a pin with 
FLUSH#. Depending on the PFLD mode, data for 
reads will either be supplied to the CPU from the 
82495xP, or from the memory bus. Table 5-3 sum- 
marizes, the 82495XP’s support for i860 XP CPU 
PFLD cycles. 
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‘Table 5-3. 82495XP PFLD Modes 


# 
| Yes 


Data Supplied From Line Fill 


5.3 82490XP Bus Configuration 


The 82490XP needs to be configured so it knows to 
drive 4 or 8 MDATA lines and whether it should do 4 
or 8 memory transfers per line fill. This is done 
through the MX4/MX8# and the MTR4/MTR8# 
configuration inputs. For a given line ratio (memory 
bus line size / CPU line size), ey should be sam- 
pled as follows: 


Table 5-4. MX/MTR Configurations 


Ratio | MX8# | MTR8# 1/0 1/0 
eee 
ae 
ee eee 
eee Oca 

| 0 
_ 


NO 


& | NM 


—_, 


NO 


Pe ee 
es ee 
ee ee 
ee a eee 
eee eee ee 
| 0 |e |e 


‘Illegal Mode 


5.3.1 82490XP PARITY CONFIGURATION 


A 82490XP may be designated as a parity device. 
This is done by strapping the PAR # pin low. ‘In this 
configuration CDATA[0:3] are used to store 4 parity 
bits, and CDATA[4:7] are used as 4 bit enables. The 
four bit enables allow ne wating of individual panty 
bits. | 


Every mode and configuration of a non-parity 
82490XP. may be used and selected on the parity 
82490XP device. The ee parity configurations 
are as follows: 


Table si Parity Configurations 


82490XP 
1/0 bits 
(CPU/Mem) 


Number 
of Parity 
Devices. 


5.3.2 CPU 82490XP ADDRESS 
CONFIGURATIONS 


The 82490XP Address inputs (A) are be caieitved to 
the CPU address lines (CA) according to the cache 
size: 


Table 5-6. 82490XP Address Connections 


~NC = No Connect. 


82490XP Address Pins 


a 
Segre ALATA AB, APL ATL AGL AS | AALAS | 921 ALA 
16 15 14 13 12 11 | 10 
=e /ele/e/s elias elles 
16 15 14 13 12 | 11 | 10 3 
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6.0 CACHE OPERATION 


Cyclo Lon 


BEs 
BRDYx 
CPU I/F 
f (CLK Synchronous) ) = Cyclo 
Roquost 


MBC 


Progress 
BGT# 
SWEND# 


182495 I/F 
(CLK Synchronous) 


| MEMBUS 1/F } 
I (CLK/SNPCLK Synchronous 
j or Strobod) 
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pemone 
{ 
{ 


182495 | 


Snoop Data 
Req/Ack In/Out 
Latch Cyclo 
Control Control 


Address 


Pw ee oo ao ea em ey ee oh 


-XVRs/LATCHES 
Optional 4 - 
ri = io P| 


Boe see coos oo eo eo co 


MEMORY BUS 
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Figure 6-1. Wiemory Bus Coniroller Interface Model | 


Figure 6-1 shows the memory bus controller (MBC) 
interface model. The memory bus controller interfac- 
es to the i860 XP CPU, 82495XP, 82490XP, and 
memory bus. The MBC interface was defined with a 
minimal set of assumptions as to the memory bus 
implementation. The chipset was designed to enable 
flexibility in the design of a memory bus and control- 
ler. ; 


The 82495xXP requests control of the memory ‘bus 


_ by signalling the memory bus controller. The memo- 


ry bus controller is responsible for arbitrating and 
granting the bus to the 82495xP. Once granted, the 
memory bus controller is responsible for executing 
the requested cycle, snooping the other caches, and 
ending the cycle. The 82495XP supports different 
modes of snooping, different modes of memory bus 
operation, and various special cycles. Memory Bus 
Controller design dictates which of these features 
are used, and exactly how they are used. 
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6.1 Cycle Attribute and Progress 


CADS#, SNPADS# 


coTs# : | 
BGT# 

Cy cle Request KWEND#(ATTRIB: MKEN#, MRO#) ; 
SWEND# (ATTRIB: MWB/WT#, DRCTM#) 


CNA# 


CRDY# 
Cycle Progress 
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Figure 6-2. Cycle Attribute and Progress Signals 


CADS# indicates the start of the cycle address 
phase. CDTS# tracks CADS# and indicates the 
start of the cycle data phase. For READ cycles it 
indicates that starting in the next CLK the CPU data 


bus is in read mode under the control of the MBC — 
until the last BRDY #. In Read cycles, if the MBC 


already owns the CPU data bus, CDTS # will be acti- 
vated with CADS #. For ALLOCATE cycles the MBC 
does not need the CPU data bus, therefore CDTS # 
is activated together with CADS #. 


For Write cycles CDOTS # indicates that the 1st piece _ 


of data is available on the memory bus. For write- 
back cycles CDTS# indicates that all data is avail- 
able (write-back buffer or snoop buffer loaded with 
correct write-back data). | 


As a response.to the cycle request, the memory bus 
controller responds. with cycle progress signals. All 
cycle progress signals are sampled ONCE in specif- 
ic windows and then ignored until CRDY# of the 
corresponding cycle. BGT # indicates a commitment 
by the memory bus controller to complete the cycle 
execution on the memory bus. Up until this point the 
82495XP. owns the cycle. This means that interven- 
ing snoop-write-backs will abort it and the 82495XP 
re-issues the cycle to the MBC. There is only one 
case where the 82495XP will issue a new, not a re- 
issued, cycle; if the original CADS# operation is a 
write-back cycle, and the interrupting snoop cycle 
hits that write-back buffer, then the subsequent 
CADS # will be for a completely new cycle (not a re- 
issuing of the interrupted CADS# operation). 
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After BGT # the memory bus controller owns the cy- 
cle. The 82495XP assumes the cycle will terminate 


and will not re-issue it on snoop-write-backs. Follow- 


ing BGT # comes KWEND ¥ which indicates that the 
cacheability window is closed and that the 82495XP 
can sample MKEN#, MRO#¥ attributes. Those indi- 
cate to the 82495XP cacheability and read-only re- 
spectively. These attributes can be determined by 
decoding the 82495xXP address. Based on those at- 
tributes the 82495XP executes ALLOCATIONS, 
Line- flls, Replacements, ete. 


Following KWEND#, SWEND # is activated. It indi- 
cates that the Snoop Window is closed. The 


- 82495XP samples MWB/WT# and DRCTM# attri- 


butes. These attributes are determined by snooping 
the other caches in the system. At this point the 
82495XP updates its TAGRAM state related to the 
line access in progress. 


Lastly the MBC issues CRDY #, which indicates to 
the 82495XP the end of the transaction data phase. 


The 82495xXP allows memory bus pipelining by pro- | 
viding CNA# which allows the MBC to request a 
new address phase before the conclusion of the cur- _ 


rent data phase. The 82495XP supports a 1 level 
.deep address pipeline on the Memory Bus. 


6.2 Snoop Operations | 


The 82495xXP provides the capability of snooping 
operations on the memory bus to ensure cache con- 
sistency. A snoop operation consists of two phases: 
1) initiation phase and 2) response phase. 


Response 


240956-13 


Figure 6-3. 82495XP Snooping Operations | 


During the initiation phase the MBC provides the 
82495XP with the snoop address information. During 
the response phase the 82495XP provides the 
snoop status information. 
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6.2.1 SNOOP INITIATION PHASE 


The 82495XP pieces three modes for initiating 
snoops: 


1. Strobed: the falling edge of SNPSTB # is used. 
2. Clocked: SNPSTB# is sampled with SNPCLK. 
3. Synchronous: SNPSTB # is sampled with CLK. 


These three snooping modes are configured as fol- 
lows: 


1. Strobed: The SNPCLK[SNPMD] signal must be 
strapped high. 


2. Clocked: The SNPCLK[SNPMD] signal must be 
connected to the snoop clock source. 


3. Synchronous: The SNPCLK [SNPMD] signal 
must be strapped low. 


SNPSTB# 

Sin 
_ SNPNCA 
MSETO-10 
MTAGO-11 
MCFAO-6 


MBAOE# 


SNPCYC# 
MTHIT# 
MHITM# 


SNPBSY# 
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NOTE: 

The 82495XP samples the SNPCLK[SNPMD] sig- 
nal at the falling edge of RESET to determine the 
snoop mode. If a rising edge occurs on the 
SNPCLK[SNPMD] after RESET has gone inactive, 
clocked mode will be selected. Systems using 
stobed or synchronous mode must ensure that no 
rising edge occur on SNPCLK[SNPMD] after RE- 
SET has gone inactive. 


Figure 6-4 shows the strobed method of snoop ini- 
tiation. The memory address, SNPNCA, SNPINV, 
and MBAOE# are latched with the falling edge of 
the SNPSTB#. If MAOE# is sampled active (low), 
the SNPSTB# will not cause a snoop. The snoop 
initiation is recognized by the 82495xXP, is synchro- 


nized in the next clock, and causes a snoop in the pray 


following clock. 


< SNOOP> 


240956-14 


Figure 6-4. Strobed Snoop Mode 
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Figure 6-5 shows the clocked method of snoop ini- SNPCLK in order to rearm for another snoop. If 
tiation. The memory address, SNPNCA, SNPINV, MAOE # is sampled active (low), the SNPSTB¥# will 
and MBAOE# are latched with the rising edge of — not cause a snoop. The snoop initiation is recog- 
SNPCLK when SNPSTB# is first sampled low. nized by the 82495xXP, is synchronized in the next 
SNPSTB#. must be sampled high for at least one clock, and causes.a snoop in the following clock. 


SNPCLK 
-SNPSTB# 
SNPINV 
; QA . A FETE AF A Hh hf 
: VAVAVAVAV A: ff AVA f A 
SNPNCA pat : 4 aS A fh } wD 
i , YY 7 YF 1 sd LN y v ‘\ 
Xf N LN Oe a Wik: td Se pe Ne 
MSETO-10 MY WM YM 
ff Wy Vy iy W Wy . = 
4 A A, KA , f somnieienet inten 
Wy VW X 
MTAGO-11 HAAKR RK x 
BY WWW 
v 
: ; ‘ EN iva Br VY ~~ r  & £ * 
MCFAO-6 KR MAAK X) 
yd W W YY 4 y Y 
é ei bs 4 


A 2 aw 
MBAOE# XX X & M) 
MAYAY vi" 


< SNOOP> 


SNPCYC# 
_MTHIT# 3 
MHITM# ° 


SNPBSY# 
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Figure 6-5. Clocked Snoop Mode 
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Figure 6-6 shows the synchronous snoop mode. The for another snoop. If MAOE# is sampled active 
memory address, SNPNCA, SNPINV, and MBAOE # (low), the SNPSTB# will not cause a snoop. The 
are latched with the rising edge of CLK when snoop initiation is recognized by the 82495xXP, and 
SNPSTB # is first sampled low. SNPSTB # must be causes a snoop in the next clock. | 

sampled high for at least one CLK in order to rearm ey | 


mee XOXOXO 
ame XXXII 
veon10 KOKO KXAN 
ous JeRROORROOOGHON DOCH 
ro OOK 

er YH9000000000007—TYKPHKHIRION 


SNPSTB# ; \ | 


‘ 
‘ 
fy 
1 
‘ 
1 
t 
1 


 SNPCYC# 


MTHIT# 


MHITM# 


SNPBSY# 


240956-16 


Figure 6-6. Synchronous Snoop Mode 
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6.2.2 RESPONSE PHASE | 


The snoop response phase consists of two parts: 
1) 82495XP state indication 2) 82495XP snoop pro- 
cessing completion. The response phase is AL- 
WAYS synchronous with the CPU CLK. The 
82495xXP state indication is presented on MHITM# 


and MTHIT# and remains stable until the next. 


SNoop. These signals indicate the state of the 
82495XP line just prior to the. snoop operation. The 
memory bus controller can predict the final state of 


the 82495XP line knowing the initial state and the © 
SNPINV. and SNPNCA, inputs. The snoop comple- | 


tion information is determined by the SNPBSY # out- 
put. The SNPBSY# output inactive: indicates that 
the 82495XP is ee to pci enue eeR cy- 


- cle. 


Figure 6-7 shows the 82495XP response to snoops 
without invalidation. The first snoop is to a line which 
is not Suronuy slolen in the cache. 


Figure 6-8 shows ihe 82498xP response to > snoop 7 


with invalidation. — | | a 
The SNPBSY # signal will be activated for one of 


two reasons: 1) a snoop hit to a modified line, 
SNPBSY # will remain active until the modified line - 


[State 


. E, S State 
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has been written back. 2) a Back invalidation is 
needed and there is a back invalidation in process. 
The SNPBSY # minimum active time is two CLK pe- 
riods. This allows an external logic to trap-hold ac- 
tive SNPBSY # using CLK. The external logic must 
first look for active SNPCYC# and then trap-hold 
SNPBSY #. 


6.2.3 PIPELINED SNOOPS 


The 82495xXP allows the memory bus controller to 
pipeline snoop operations. The 82495xXP allows the 
next snoop address to be supplied and the next 
snoop requested before the last snoop has complet 
ed. 


There are a set of. rules which govern the operation 
of pipelined snoops. These rules are as follows: 


(1) For strobed mode snoops, the memory bus con- 
troller cannot cause a second falling edge of 
SNPSTB# until after the falling edge of 
SNPCYC#. 5 | : 

(2) For clocked mode snoops, the memory bus con- 
troller cannot cause a second falling edge of 
SNPSTB # to be sampled by SNPCLK, until after 
the falling edge of SNPCYC#. _ | 


M State 


~swppsye YK XT A 
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Figure 6-7. Snoops without Invalidation 
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Figure 6-10. Fastest Possible Asynchronous Snooping 


(3) For synchronous mode snoops, the memory bus 
controller cannot cause a second falling edge of 
SNPSTB# to be sampled by CLK, until the CLK 
after SNPCYC# is active. 


6.2.4 OVERLAPPING SNOOPS WITH MEMORY 
BUS CYCLES 


The 82495XP allows snoops to be overlapped with 
data transfers. The 82495XP divides the memory 
bus cycle into 4 main regions as shown below: 


CRDY# CADS# BGT# SWEND# CRDY# CADS# 


ee a a a 


Region 1 is after a previous memory bus cycle (i.e. 
after CRDY #) and before the new memory bus cy- 
cle starts (before CADS #). A snoop in this region is 
looked up immediately and serviced immediately. 


Region 2 is after a memory bus cycle has started 
(CADS #) but before the 82495XP has been granted 
the bus (BGT #). A snoop in this region is looked up 
immediately and serviced immediately. CADS# is 
re-issued for the aborted cycle once the snoop com- 
pletes. | i: 


Region 3 is after the 82495XP has been granted the 
bus and before the SWEND # is completed. A snoop 
in this region has its lookup blocked until after the 
SWEND#. After SWEND#, the snoop response is 
given, but no write-back will be initiated until after 
CRDY #. 


Region 4 is after SWEND# and before CRDY#. A 
snoop in this region is looked up immediately but 
serviced after CRDY #. This snoop is logically treat- 
ed as if it occurred after CRDY # (snoop hits to mod- 
ified data will schedule a write-back which will be 
executed after the conclusion of the current memory 
bus cycle). Note that the result of the snoop 
MHITM#, MTHIT# will be available immediately 
with the look-up. 
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6.2.5 SNOOP INTERLOCK 


The 82495XP uses two interlock mechanisms to en- 
sure that Snoops are identified within the proper re- 
gion. The first interlock ensures that once a BGT # 
has been given snoops are blocked until after 
SWEND #. The second interlock ensures that once 
a snoop has been started BGT# cannot be given 
until after the snoop has been serviced. 


Figure 6-11 shows how once the 82495XP sees a 
BGT # it blocks all snoops until after SWEND#. If a 
snoop has been initiated, and no SNPCYC# has 
been issued before BGT # assertion, the snoop has 
been blocked. 


Figure 6-12 shows a snoop occurring before BGT #. 
Once the 82495XP has honored a snoop, the 
82495xXP, depending on the result of the snoop, may 
ignore BGT# until the snoop is serviced. The 
82495XP will always ignore BGT # when SNPCYC# 


CLK 


BCT# . 


SWEND# 


CLK 
SNPCYC# 
_ MHITM# 


_ SNPBSY# 
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is active. If the snoop result is a hit to a modified line 


(MHITM # active), the 82495xXP will ingore BGT # as 
long as both SNPBSY# and MHITM# remain ac- 


tive. In this case, it is the memory bus controller’s 
responsibility to hold BGT# until SNPBSY# goes 
inactive or reassert it after SNPBSY # becomes in- 
active. If the snoop result is not a hit a modified line 
(MHITM # inactive), the 82495XP is capable of ac- 
cepting BGT # even when SNPBSY # is active. This 
allows the memory bus controller to preceed with a 
memory bus cycle by asserting BGT# while the 
82495XP is performing back-invalidations. 


These two interlock mechanisms provide a flexible 
method of ensuring predictable handling of over- 
lapped snoops. 


NOTE: 


Even when snoops are delayed, address atoning is 
performed with SNPSTB# activation. 


240956-21 
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Figure 6-12. Snoop Occurring before BGT # 
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6.2.6 SNOOPS CONCURRENT WITH LINE FILL 
CYCLES 


During snoops concurrent with line-fills/allocates, 
the following responsibility boundaries must be full- 
filled in order to insure data consistency: 


° If a snoop happens before BGT #, more precisely 
if SNPCYC# is active before BGT#, it is the sys- 
tem’s responsibility not to return stale data within 
the line-fill/allocation. 


° lf a snoop happens after BGT #, more precisely if 
SNPCYC# is active after BGT#, then the 
82495XP insures data consistency by providing 
interlocks with the CPU which avoid caching of 
stale data. 


6.3 Wemory Bus Coniroller Interface 
Rules 


To begin a cache cycle, the 82495XP outputs the 
CADS# signal. The cache address and other cycle 
parameters are guaranteed to be stable with 
CADS# assertion. These parameters are guaran- 
teed to be stable until CNA# or CRDY# of that cy- 
cle. After CNA# or CRDY# these parameters are 
undefined. 


Either during, or after CADS# the CDTS# signal is 
asserted. Data is guaranteed to stable with CDTS# 
assertion, or the data path is available. 


82495 Output Signals 


CADS# CDTS# 


NS i ag soce a a Naa 


BGT # KWEND# 
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BGT# and CRDY# are required for all (non-snoop) 
cycles. KWEND# and SWEND# are only required 
for those cycles which sample them. 


Once a signal has been sampled, it is a “don’t care” 
until CRDY # of that cycle. Additionally, these sig- 
nals plus the attributes MRO#, MKEN#, MWB/ 
WT#, and DRCTM# need only follow setup and 
hold times when they are being sampled. 


For pipelined cycles, the cycle attributes (BGT#, 


' KWEND#, ... ) will only be sampled after CRDY # 


of the previous cycle. 


Note that there are many other rules that govern 
when signals may be asserted in relation to one an- 
other. These may be found in the specific pin de- 
scriptions of each signal in chapter 7. 


Snoop-Write-Back cycles are a subset of the normal 
cycles. Snoop-Write-Back cycles are requested as a 
consequence of snoop hits to Modified lines. Those 
are intervening cycles and are requested by activat- 
ing SNPADS # instead of CADS #. For those cycles, 
the 82495XP only samples the CRDY# response. 
The 82495XP assumes that the memory bus con- 
troller owns the bus to perform the intervening write- 
back (restricted back-off protocol) and that no other 
agents will snoop this cycle. Also the 82495XP will 
ignore CNA# during Snoop-Write-Backs. 


SWEND# CRDY # 


82495 Input Signals 


Figure 6-13. Cycle Progress 
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82495 Output Signals 


SNPADS# CDTS# 


82495 Input Signals 


Figure 6-14. Cycle Progress for Snoop Cycles 


6.4 LOCK# Protocol 


The 82495XP provides a LOCK signal for the memo- 
ry bus called KLOCK#. KLOCK# is generated by 
the 82495XP whenever the CPU generates the 
LOCK# signal. KLOCK#, like the other cycle attri- 
butes, is valid with CADS# assertion. 


| When the CPU generates a LOCK: cycle, the 


82495XP always generates a bus cycle. LOCK cy- 
cles are non-cacheable to both the 82495XP and 
CPU, so the information is passed through the 
82490XPs to the CPU with BRDYs generated by the 
MBC. If the LOCKed read cycle is a hit in. the 
82495xXP, the 82495XP ignores the data that it is 
receiving and supplies data from the 82490XP array 
(in accordance with the BRDYs supplied by the 
MBC). Locked writes are posted like any other write. 
LOCKed cycles, both reads and writes, never 
change the 82495xP tag state. 


During a LOCKed cycle, the MBC must prevent oth- 
er masters from snooping the 82495XP. Specifically, 
the MBC must prevent SNPSTB# between BGT # 
of the first LOCKed transfer, and SWEND# of the 
last LOCKed transfer. 


Locked 
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6.5 Cycle Length 


When CADS# is generated, the 82495XP outputs 
CW/R# and MCACHE #. These signals provide the 
MBC with enough information to determine the type 
of 82495XP cycle. Table 6-1 summarizes the cycle 
types for the 82495XP/82490XP. All line-fills and 
write-backs to the 82495XP/82490XP cache oper- 
ate on the entire length of a cache line. 


In addition to the length of the cycle from, the 
82495XP/82490XP, the memory bus controller may 
need to determine the length of the cycle to the 
CPU. Specifically, for those 82495XP cycles where 
RDYSRC=1, the MBC must decode the i860 XP 
CPU’s W/R#, LEN, and CACHE# outputs to deter- 
mine the number of BRDY#s which the MBC will . 
provide to the CPU. These signals are captured for 
the current cycle by a user-provided BE latch (see 
Section 7.2 for details). Table 6-2 presents the CPU 
cycle length definitions; see the i860 XP microproc- 
essor Data Sheet (Order #240874) for further de- | 
tails. 


Locked 


CADS CADS 
LOCK LOCK 
Read Write 


~ Time | 


BGT SWEND BGT 


a MBC Must Not Allow Snoops 


Figure 6-15. Snooping During LOCKed Cycles 
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- Table 6-1. 82495XP/82490XP Cycle Determination 


CW/R# RDYSRC MCACHE # MKEN# 


ne Type 


[=| Wen Cocheabie 6-8 Read 
[1 | Non-Cacheable 6-8itRead 
es 
[=] wo and Special Gyeles 
[=| Non-Cacheabie 120-8it Read 
ei abet 
me 
a 
ad 


Non-Cacheable 128-Bit Bisbal 
128-Bit Write 
Cache Line Fill ae 


iil 
z SETS 


NOTE: 


Cache Write-Back 7 


If MRO# is asserted to the 82495xP, the effect on i860 XP CPU cycle determination is the same as ee MKEN# = 


6.6 Consecutive Cycles 


Because a 82495XP line can be longer than a CPU 
line, there are circumstances where a read miss will 
be to a line that is currently being filled. If this is the 
case, the 82495xXP treats this like a read hit, but 
supplies data after CRDY# for the line fill. Data is 
supplied from the 82490XP array. 


6.7 CPU/Memory Bus Concurrency | 


The 82495XP allows concurrency between the CPU 
and memory buses. CPU bus cycles will either be 
serviced locally by the 82495xXP (hits) or require 
memory bus service. Whenever a CPU cycle re- 
quires memory bus service, it will be scheduled to 
run on the memory bus, and CPU bus activity will be 
allowed to continue. 


Examples of concurrency are: | 
— Snoops and CPU bus operations 


— Posted writes with CPU and pe, bus opera- 
tions 


— CPU bus operation on the back of long line fills © 
(82495xP line longer than the CPU line) 


— Allocations and replacements with CPU and 
memory bus operations. 


In certain cases, consistency of data and prevention 
of deadlocks preclude concurrency. Problems may 
occur when the current memory bus cycle changes 
the tag state and therefore affects the operation of 
the next CPU cycle request. In those cases the 
82495xXP will hold concurrency to ensure data con- 
sistency. Handling of those cases is completely 
transparent to the MBC. | 


The 82490XP supports two modes of memory bus 


- operation: clocked mode and strobed mode. In 


clocked mode, memory bus signals are sampled by 
the 82490XP on rising edges of MCLK. Similarly, 
memory bus data and signals are output by the 


82490XP with respect to MCLK (or MOCLK) viene 


edge transitions. 


In strobed mode, memory bus signals are sampled 
or output with respect to rising and falling edges of 
other signals. Strobed mode has the advantage of 
not requiring setup and hold times to a CLK or MCLK 


edge. 
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6.8 NMemory Bus Modes 


Clocked Memory Bus Mode ~ 
cK VSS VS VS VS 


' 1 
setup hold 
240956-27 


Strobed Memory Bus Mode 


Cycle 
Start. 


MSEL# ~ 


Active Inactive ae 
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Figure 6-16. Clocked and Strobed 
Mode Sampling 


6.8.1 CLOCKED NICDE — 

In clocked mode operation MCLK is piaedl to ater: 
ence the signals MDATAO—-MDATA/7, MSEL#, 
MFRZ#, MZBT#, MBRDY#, and MEOC#. Clocked 


mode will be selected if the 82490XP detects a 
clock at its MCLK input after RESET. MCLK need 


not have any relation to CLK. If this is the case, the 


memory bus is said to be operating in ‘clocked 
asynchronous” mode. If MCLK = CLK, the memory. 


bus is operating in “clocked synchronous’’ mode. If 
MCLK x N = CLK (where N = 2, 3, 4... ), the 
‘memory bus is operating in “clocked divided syn- 
chronous” mode. These three clocked modes, asyn- 
. chronous, synchronous, and divided synchronous, 
are not differentiated by the 82490XP. | 


MOCLK controls a transparent latch at the 82490XP 
data output pins. If a clock is provided at this input, 
data is latched with MOCLK going low. This clock is 
available in clocked mode only. MOCLK allows the 
system to provide a greater MDATA hold time by 
skewing MOCLK from MCLK. If MOCLK is tied high, 
MDATA is driven from MCLK. | 


6.8.1.1 Synchronous Clocked Mode 


In synchronous clocked mode MCLK = CLK. This 
means the CPU clock is used for 82495xP, 
82490XP, and the memory bus. A_ synchronous 
memory bus allows memory to communicate with 
the 82495XP without synchronizers since the 
82495xXP runs with CLK. With a synchronous design, 


however, high clock frequencies must be routed to _ 


all parts of a system with minimal skew. This may 
not be possible with future projected frequencies. A 
synchronous memory system and memory bus con- 
troller must be redesigned when future speed up- 
grades are required. 
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_ 6.8.1.2 Asynchronous Clocked iMode ~ 


In asynchronous clocked mode, MCLK is not the. 
same frequency as CLK..Some memory signals, 


since they reference MCLK, must be synchronized 


to CLK to communicate with the 82495XP. For ex- 
ample, when a cycle completes, the memory system 


asserts a signal, driven from MCLK, to the memory 


bus controller which will be synchronized to CLK to 
become CRDY#. This is because CRDY# is syn- 
chronous to CLK andnot MCLK. | 


Asynchronous mode allows the rest of the system to 


- run at a lower frequency than the CPU CLK. Not only 
~ does this simplify system design, but allows the. de- 


signer to place hooks to allow the same design to 


scale easily to a higher frequency. If all the features 


of the 82495XP are used properly, an asynchronous 
memory design does not have to incur much syn- 
chronization penalty. For example, MEOC# is syn- 
chronous to the memory environment (MCLK). This 
allows the memory system to on the current cycle 
and start the next before CRDY # is synchronized in 
the CPU environment. 


6.8.1.3 Divided Synchronous Clocked Mode 
Divided synchronous clocked mode is a subset of 


synchronous clocked mode. It allows two things to 
happen: One, the memory system is capable of 


~ communicating with the 82495XP without synchroni- 


zation. Two, a slower frequency clock may be routed 


around the system. 


Divided synchronous mode still requires clock skew 
restrictions. It also carries the same scalability draw- 
backs that full ‘synchronous mode does. 


6.8.2 STROBED MODE 


Strobed mode is configured on the 82490XP by 
strapping MCLK high. In strobed mode: 


— MDATAO-MDATA7 are sampled with nee to 
edges of MEOC#, MISTB, and MOSTB. | 


_-— For. “write cycles, MFRZ# is samp when 


MEOC# goes active. 


— MZBT# is sampled when MSEL# i is inactive: 
and is latched when MSEL# goes active. 
MZBT# is also sampled for the next operation 
when MSEL # is active and MEOC # goes active. 


By not using MCLK, strobed mode has no setup and 


hold time restrictions, and is scalable to higher fre- 


quencies. Strobed mode does, however, require 
synchronization to 82495XP CLK synchronous sig- 
nals. 
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6.9 WWiemory Bus Operation 


All data is handled by the 82490XP cache RAMs. 
The 82495XP instructs the 82490XP whether to use 
the data array or buffers, and specifically which buff- 
er to use. The MBC is responsible for bursting data 
in and out of the 82490XP’s, in and out of the CPU 
during miss cycles, and indicating when the opera- 
tion is finished. Communication between the 
82490XPs and memory bus may be done in a 
clocked mode or strobed mode. See the Memory 
Bus Modes section for more details. 


A 82490XP has 4 memory buffers. It has 2 memory 
cycle buffers, one write-back buffer, and one snoop 
buffer. Each buffer is capable of holding an entire 
82495XP line of the longest configurable length. 


The memory cycle buffers of the 82490XP are used 
for posting writes and holding data during 
82495XP/82490 xP line-fills. The write-back buffer is 
used for holding data from. a cache replacement. 
This data is ready to be written out, and the write- 
back buffer is snoopable. The snoop buffer is used 
to hold modified data that has been hit by a snoop. 
Since snoop hits are the highest priority cycle, this 


buffer will be emptied before any other cycle or. 


snoop request begins. 


6.9.1 82490XP BUFFERS AND MUXES 


The 4 82490XP memory buffers are all multiplexed 
(muxed) to the memory bus. The mux is used to se- 
lect which buffer is on the bus, and specifically which 
slice of that buffer is on the bus. MBRDY # assertion 
increments a counter for this mux which selects the 
next slice of that buffer. 


The counter used to increment through the buffer 
slices is called the memory burst counter. The mem- 
ory burst counter follows the CPU burst order de- 
pending on the subline address of the initial slice. 
Once the MBC is finished with a buffer, MEOC# is 
asserted to switch the mux to the next buffer to be 


used. MEOC¥# will also reset the counter and latch 


the last slice of data. 


On the CPU side, the 82490XP contains a CPU buff- 
er and mux. The CPU buffer captures data from the 
appropriate memory buffer or 82490XP array to feed 


it to the CPU. The mux selects which slice is muxed. 


to the CPU bus. The counter for this mux is incre- 
mented with BRDY #. 


The 82490XP array contains a mux that selects 
which way, based on the MRU algorithm, will be 
read during hit cycles. This mux is used during write 
cycles to write to the correct way. 
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6.9.2 MEMORY CYCLE BUFFERS 


There are 2 memory cycle buffers in the 82490XP. 
They are used for line-fills, allocates, and memory 
writes. The buffers are 64-bits wide (per 82490 XP) to 
support 8 transfers with 8 memory bus I/O pins 
(maximum configuration). The 82490XP alternates 
use of these buffers. When one buffer has a posted 
write or is being used for a memory read, the other 
one is available for the next cycle. 


During allocation cycles, read for ownership may be 
implemented by using the MFRZ# signal. lf MFRZ# 
is sampled active during the write cycle, the memory 
cycle buffer will freeze the write data in the buffer so 
the subsequent line-fill fills around it. This way the 
write cycle need not be written to memory. The line 
must be tagged as modified. 


6.9.3 WRITE BACK BUFFER AND SNOOP > 
BUFFER — | 


The write back buffer and snoop buffer are both 64- 
bits to handle the maximum 82495xXP line length. 


- The write back buffer is used when replaced data 


must be written back to main memory (including 
FLUSH and SYNC cycles) and the snoop buffer is 
used when data must be written out from a snoop 
hit. | 


Before a line fill begins, the 82495XP checks to see 
if it must remove a modified line to make room for 
the line-filled line. If so, the modified line is placed in 
the write back buffer and the line-fill is filled through 
a memory cycle buffer. Should the line-fill be select- 
ed as non-cacheable, both buffer contents are dis- 
carded and the 82490XP array value remains as it 
was before the line-fill. 


There is no need to run the line-fill, replacement 
(write back), FLUSH, or SYNC cycles contiguously. 
If a snoop is requested between the two cycles, the 
write back buffer is snoopable, and data can be writ- 
ten directly out of it if need be. 


6.9.4 MEMORY BUS CONTROL SIGNALS 


The main memory bus control signals are MSEL#, 
MEOC#, MBRDY#, and CRDY#. These signals 
control the 82490XP data path, buffers, and muxes. 


MSEL# selects which 82490XPs are being used in 
the current cycle by qualifying the MBRDY# signal. - 
If MSEL # is inactive, MBRDY # is not recognized for . 

that 82490XP. MSEL# is also used to reset the 
memory burst counter. If MSEL# goes inactive, the 
counter is initialized to its starting value. This is use- 
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ful for aborted/restarted cycles. MSEL# may remain 
active for many or all cycles. MSEL# must, howev- 


er, be inactive sometime after RESET to initialize the | 


ueely’ puts counter for the first time. 


MEOC# is asserted by the MBC to ‘end finish with 
the current buffer, and switch the memory bus to the 
next buffer to be used. MEOC# latches in the last 
piece of data and resets the memory burst counter 
before switching to the new buffer. 


MBRDY # is used to increment the memory burst 
counter to select the next slice of data. This will 
strobe data out of the 82490XP (write cycles) or load 
data into the 82490XP (read cycles). MBRDY# is 
ignored by the 82490XP if MSEL# is inactive. 


CRDY # finishes the current cycle. Once CRDY # is 
asserted, the 82490XP disposes of the information 
in the buffers used in that cycle, and loads informa- 
tion into the 82490XP array. CRDY# must be as- 
serted on the clock or after MEO # is asserted for 
a patna cycle. 


MBRDY # 


| + Data From Memory MUX 
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6.9.5 82490XP DATA PATH 


An example 82490XP read data path is shown in 


Figure 6-6. The path between the CPU and memory 


bus is a flow-thru’ path, not a clocked path. Each 
entire 82495XP cache line of data in the CPU buffer 
is available at the memory buffer with some propa- 
gation delay. Likewise, each entire 82495XP cache 
line of data in the memory buffer is available in the 
CPU buffer with some propagation delay. Data is 
burst into and out of the memory buffer using 


-MBRDY# or MISTB/MOSTB. Data is burst into and 


out of the CPU buffer using BRDY#. This means 
there is no synchronization required! between memo- 
ry and CPU data paths. | ) 


To give an example how the path wares during : a 
CPU line fill, data may be returned to the CPU in two 
different fashions. One, each time the memory buff- 
er fills a dword, BRDY# may be asserted a clock 
later to burst it back to the CPU. Two, the memory 
buffer can be filled and then BRDY# asserted on 
four consecutive clocks to burst eats back to: the 
GPU. 


Data To CPU MUX 


CPU Latch 


‘ Mem Buffer # 
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Figure 6-17. 82490XP Read Data Path 
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6.9.6 WRITE CYCLES 


There are 3 basic types of write cycles: CPU gener- 
ated write cycles, write back cycles caused by a 
cache replacement, and snoop write back cycles 
caused by a snoop hit. All write cycles, except the 
snoop write back, begin with CADS# assertion. The 
snoop write back cycle begins with SNPADS #. 


6.9.6.1 CPU Generated Write Cycles 


When the CPU begins a write cycle, four things can 
happen to it. One, the CPU write is a hit to a modi- 
fied or exclusive line. In this case the write is termi- 
nated by the cache immediately and invisibly to the 
MBC. 


Two, the write is to a shared location. This type of 
write is posted to the 82490XP memory cycle buffers 
and the cycle is terminated by the cache. If a memo- 
ry cycle buffer is occupied with a write cycle, the 
CPU waits until the previous write completes. The 
write cycle must be written to the memory bus so 
that other copies of the write in other caches be 
invalidated. 


Three, the write is a cache miss. This type of write is 
posted to a memory cycle buffer if the 82490XP is 
not waiting for another posted write to complete. If 
PALLC # is asserted, the write may be turned into an 
allocation. 


Four, the write is a LOCKed write. LOCKed writes 
are posted regardless of the tag state. The write is 
then treated as if it were a miss except that there is 
no change in the tag state and no allocation allowed. 


6.9.6.2 Cache Generated Write Cycles 


The 82495XP/82490XP will generate a write cycle in 
three situations: a line fill or allocation causing a 
cache replacement, a snoop hit to a modified loca- 
tion, and write backs caused by FLUSH or SYNC. 
Write back caused by FLUSH or SYNC are indestin- 
guishable from write-back cycles caused by replace- 
ment. Cache generated write cycles are the length 
of a cache line. 


Cache replacements and FLUSH/SYNC cycles 
Cause a line (or two lines if sectored) of cache data 
to be placed in the write-back buffer of the 82490XP. 
If no cycle is pending, CADS# is asserted and the 
data is written out. If a snoop hits the write-back 
buffer, the data is written out via SNPADS# like a 
normal snoop hit. The write back is then cancelled 
since the data was written through the snoop hit. 
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A snoop hit to a modified location causes a line of 
cache data to be written out to memory. Snoop hits 
are the highest priority cycle and must be serviced 
immediately. A snoop hit to a modified location caus- 
es the snooped line to be written to the snoop buffer 
of the 82490XP. SNPADS # is then asserted and the 


Snoop is written out. 


6.9.6.3 Memory Bus Controller Responsibility 


The MBC recognizes a write cycle with CADS# and 
CW/R# (or SNPADS# ‘for snoop cycles). If 
MCACHE # is active, the MBC knows the cycle is a 
write back cycle, otherwise i is a CPU-generated 
cycle. 


CPU-generated write cycles are written to the main [¥ 
memory bus so that other caches can invalidate | 
their copies of this information. The other caches do 
this by snooping with SNPINV active during snoop 
initiation if they detect a write cycle on the bus. 


Once the MBC detects CDTS# active, the data will 
be available for writing in the next clock in the appro- 
priate 82490XP buffer. The MBC should assert 
MSEL# so bursting is enabled, and burst through 
the write using MBRDY # (or MOSTB). MSEL# acti- 
vation also caused MZBT # to be sampled. MZBT # 
must be inactive at this time if the data will be written 
according to CPU burst order. 


Once the write cycle is complete, MEOC# must be 
asserted to end the write cycle and switch to the 
next pending cycle. If this write cycle is turned into 
an allocation, MFRZ# is sampled with MEOC# to 
freeze the write data in the 82490XP. 


MEOC# simply switches buffers from the current 
one in use to the buffer of the next pending cycle. 
CRDY # needs to be asserted to actually. end the 
cycle and allow the 82495XP and 82490XP to dis- 
pose of the information. 


6.9.6.4 Write Allocation and Read for 
Ownership 


The 82495XP/82490XP supports write allocation. 
An allocation cycle is a read of a cache line caused 
by a write miss to the same location. In its simplest 
form, a write miss is written to memory, then the 
82495XP requests a line from that same location. 
Meanwhile, the CPU only sees the write cycle. 


Write allocation may only be done if PALLC# is ac- 
tive during CADS # of the write cycle. For the alloca- 
tion to occur, MKEN# must be returned active dur- 
ing KWEND # of the write cycle. The write cycle may 


2-293. 


be an actual write or a “dummy” write. Dummy 
writes are write cycles that are terminated in the 
82495XP and 82490XP as if they were normal 
writes, but the data is not actually written to momely: 
This saves a data write to memory. 


During write allocation, the write cycle will progress © 


like a normal write cycle except MKEN# must be 
active during KWEND*# activation. If the write cycle 
is a dummy write, MFRZ# must be used with 
MEOC# so that the line filled data is read around 
the write data into the 82490XP buffer. The line fill 
cycle is like any other line fill cycle except the CPU 


doesn’t get any. data. If a dummy write was per- 


formed, DRCTM# must be asserted during 


SWEND# activation to fill the line to the M state, 


and any cache supplying the data must invalidate its 
copy. | 


Using dummy write ‘cycles and filling data to the M 
state from another cache or memory is called Read 
For Ownership. This is because ownership is being 


transferred. In read for ownership cycles, memory is. 


avoided as much as possible. First, the dummy write 
cycle avoids memory. Second, a line fill is performed 
as a cache to cache transfer. with DRCTM# assert- 
ed. All caches were snooped with invalidate to elimi- 
nate their copies. : 


For allocation cycles, SWEND# is not sampled for 
the write portion of the allocation. 


6.9.7 READ CYCLES 


The CPU initiates all read cycles. These are a deuelhy 
line fills to the CPU and line fills to the 
82495XP/82490XP. The signal MCACHE # is output 
with CADS # to indicate whether this cycle may or 
may not be cacheable. If cacheable, MKEN# is re- 
turned by the MBC to ultimately determine cachea- 
bility. 


Read hit cycles are serviced by the cache without 
MBC intervention. The only read cycles seen by the 
MBC (except I/O or special) are read misses and 
locked read cycles. 


Read misses cause CADS# to be asserted at most 
two clocks after ADS# of the CPU read cycle. If 
cacheable, as determined from MCACHE#, the 
MBC will return 4 BRDYs back to the CPU and 4 or 8 


MBRDYs to the 82495XP/82490xP. If the transfer is - 


non-cacheable, the i860 XP CPU LEN and CACHE # 
outputs indicate the number of transfers to be given 
to the CPU. MBRDY#. need not be used in the 
transfer if only a single pices of data is required by 
the CPU. , 
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If the read cycle is cacheable, it may cause another 
cached line to be bumped out of the cache. This is 
called a replacement and, if modified, causes a write 
back cycle. While one of the 82490XP memory buff- 
ers is being filled for the line fill, the write back buffer 
is loaded. If the line fill turns out to be non-cache- 
able at the end of the transfer, the write-back buffer 
is discarded, and the line in the cache remains valid. 
Otherwise, CADS# will be generated after the read 
cycle so the write back can be performed. The write 
back need not happen immediately after the line fill 
since the write-back buffer is snoopable. 


All locked reads go to the memory bus. If the read is 
a cache hit to M’, the 82495XP/82490xXP will ignore 
the data that the MBC returns, and provide it from its 
array. Locked reads are not cacheable by the CPU 
or the 82495XP/82490XP. Snoop write-backs that 
are a result of a LOCKed read/write request must 


update memory. 


6.9.7. 1 Memory Bus Controlier Responsibility 


Once the MBC sees a read cycle on the memory 
bus, it must determine whether the read is cache- 
able or non-cacheable using MCACHE # and its own 
address decoding. If non-cacheable, the CPU ex- 
pects a-number of transfers as determined by its 
LEN and CACHE # outputs. If cacheable, the CPU 


expects 4 transfers, and the cache expects 4 or 8 


(configuration dependent). 


MKEN# is sampled during KWEND# to determine 
cacheability. Before MKEN# is sampled, KEN# is 
active assuming cacheability for the CPU. MKEN# 
must be sampled 1 clock before the first BRDY # to 
make the cycle non-cacheable. 


Once the read cycle is given to the memory system, 
all 82495XP/82490XP caches snoop to see if they 
contain. the data in modified form. If so, the MBC 
must abort the cycle in memory and receive the data 
directly from the 82495XP/82490xP that has it, or 
wait until that cache writes it to memory. If the data 
transfer avoids memory, ie goes cache to cache, 
DRCTM# must be asserted with SWEND # to place 
the line in the M’ state and the cache giving the data 
must invalidate its copy. 


MSEL # is activated and MBRDY # (or MISTB) used 
to sample input data from the read cycle. Once 
CDTS# has been seen active, the CPU read data 
path is clear. BRDY# may be returned to the CPU 
sometime after each MBRDY # for each piece of in- 


put data (see MDATA setup to CLK). Once the 
‘transfer completes, MEOC# and CRDY # are as- | 


serted to complete the cycle in the 82495XP/ 


82490XP. 
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6.9.8 I/O AND SPECIAL CYCLES 


1/O and special cycles (flush, etc) are decoded by 
the 82495xXP and not posted. These cycles wait until 
all buffers have been written, and all cycles have 
been completed, before they cause CADS# asser- 
tion. The CPU waits until the special cycle ends with 
the MBC’s BRDY# assertion before it continues. 


When the 82495XP/82490XP is performing a 
FLUSH or SYNC, many write back cycles are re- 
quired. These cycles look like ordinary write back 
cycles, and should be handled as such. FSIOUT # is 
active during these write back cycles, so when FSI- 
OUT # goes inactive the cycle is complete and the 
memory bus controller can supply BRDY# to the 
CPU. | 


6.10 Different Bus Widths 


The 82490XP is capable of supporting either 64- or 
128-bit memory bus widths. Depending on the con- 
figuration, the 82490XP’s CPU and I/O busses may 
be multiplexed. The following diagram shows how 
an i860 XP CPU may be connected to a 128-bit 


memory bus: 
| 82490XP fd 
1 i 


1 82490XP fe 


> D124-D127 


D60-D63 » D60-D63 


»| 82490XP i 
| 13 
| 82490xP 

y 12 | 


i860 XP f 


CPU f. i 


| 82490XP id 


? 


,f 82490XP fk 
2 «K 


| s24o0xe }¢—p 068-071 


| 82490xP je——p D64-D67 
: +—> D0-D3 
| 240956-30° 


Figure 6-18. 82490XP On Wide Bus | 
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In this example, the CPU port of the 82490XPs is in 
x4 mode and the memory bus port is in x8 mode. 
This allows all 128 bits of the memory bus to be 
multiplexed to the 64-bit CPU bus. 


For read cycles, each MBRDY# loads 8 bits into 
each 82490XP. This is 128-bits of data. It will take 2 
BRDY# assertions to load this into the CPU. The 
first BRDY # assertion loads the first 4 bits onto the 
CPU bus, and the next BRDY# assertion loads the 
remaining 4 bits. 


For a 64-bit write cycle, the data is available at the 
on the appropriate data bits. On the i860 XP CPU 
with a 128-bit bus, this is determined by CPU ad- 
dress bit A3. The other data bits are undefined. For 
write-back cycles, all 128 bits are available at once. 5 
MBRDY # assertion will strobe the next 128 bits on § 
the memory bus. 


7.0 DETAILED PIN DESCRIPTIONS 


The following chapter provides a detailed descrip- 
tion of each pin of the 82495XP and 82490XP. The 
pins have been categorized by function. Each pin 
description has a heading which summarizes the 


- most important aspects of the pin. The heading is 


organized as: 


Pin Name 


Name Meaning 
Pin Function 


1/0, 82495XP/82490XP/i860 XP CPU, (location) 


Signal Type 
Synchronous/Asynchronous 


for example, 


CADS # 


| Cache Address Strobe 


Indicates beginning of cache cycle 
Output from 82495XP (pin E3) Cycle Control Signal 
Synchronous to CLK | 


Following the heading are three sections. The first 
section, Signal Description, provides information of 
what the signal does, how to use it, and in what 
modes it operates. The second section, When Sam- 
pled or When Driven, indicates all the exact places 
where the part samples the signal, generates the 
signal, or neither. The third section, Relation to Oth- 
er Signals, mentions the other signals that are af- 
fected by this signal, synchronization requirements, 
and shared pins. 
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All specific information about each pin is provided | in 
this chapter. | 


7.0.1 CONFIGURATION SIGNALS 


These signals are inputs to: the 82495XP and 
82490XP that are sampled at RESET and alter the 
configuration and operation of the cache. 


"Hold 
240956-31 


Figure 7-1. Configuration input Setup and Hold 


Each set of configuration inputs may have different 
setup times, but all signals have the same hold time: 
The signals may be released on the CPU clock edge 


that RESET is detected inactive. There are some 


configuration signals that are strapping options and 
cannot change their value during 82495XP opera- 
tion. 


7.0.2 CPU BUS INTERFACE SIGNALS 


These pins comprise the interface between CPU 
and 82495XP/82490xXP. The signals in this interface 
are not flexible; Chapter 10 addresses the use of 
these signals. The following are the CPU bus inter- 
face signals: 


TAGO-TAG11 


SETO-SET1 0 CFAO-—CFA6 
ADS # W/R# D/C#> 
M/lO# HITM# LOCK # 
PWT PCD LEN 
BRDYC1 # KEN # AHOLD 
EADS # BEO-BE7# INV. 

BOFF # 


The majority of these signals must be connected 
strictly between the i860 XP CPU and the 82495xP. 
However, a subset of these signals is needed by the 
MBC to decode the i860 XP CPU cycle in cases 
where the MBC provides BRDYs to the CPU. For 
these purposes the following signals must also be 
inputs to a latch controlled by ne 82495XP’s BLE # 
output: 


BEO#-BE7# | CACHE# — CTYP 
LEN PCD ~ PCYC 
PWT a: 
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7.0.3 82495XP/82490XP INTERFACE SIGNALS 


These pins comprise the interface between the 
82495XP and 82490XP. The 82495XP uses these 
pins. .to.control the 82490XP and its buffers. The sig- 
nals in this interface are not flexible; Chapter 10. ad- 
dresses the use of these signals. The following are 
the 82495XP/82490xXP interface signals: | 


WAY - 


WRARR# | MAWEA# 
BUS# ~MCYC# WBWE # [LR1] 
WBA[SEC2] WBTYP[LRO] BRDYC2# 
BLAST# | BOFF# 

SIGNAL DESCRIPTIONS _ 

7.1 BGT# 


Bus Guaranteed 1 Tastes 


Signals 82495XP of memory bus controller's com- 
mitment to complete the bus cycle. 


Input to 82495XP (pin M4) Cycle Progress Signal 


Synchronous 


7.1.1 SIGNAL DESCRIPTION 


The 82495XP owns all bus cycles (initiated by 
CADS#) until the memory bus controller accepts 
ownership. During this time cycles may be aborted 
due to a snoop. The memory bus controller signals 


its acceptance of ownership by driving BGT # active 


into the 82495XP. Once. BGT # is driven active, the 


memory bus controller is responsible for completing 


the cycle on the memory bus. CRDY # signals com- 
pletion of the cycle. 


Once BGT # is asserted, other devices may not per- 
form snoops into the 82495XP until the end of the 


snooping window, SWEND# activation. The snoop 


address is latched if SWEND is asserted between 


BGT # and SWEND#, but the snoop does not begin 
until after SWEND# is asserted. SNPCYC# will not 
be asserted until the snoop window ends with 
SWEND# asserted. The advantage of asserting © 
BGT# early is that it allows the 82495xP to start 
inquiries to the CPU, load the write-back buffer, and 
progress forward in the CPU bus pipeline. The disad- 
vantage is that snooping of this 82495XP is now 
blocked until SWEND # is asserted. 


7.1.2 WHEN SAMPLED 


After the 82495XP asserts CADS#, it begins sam- 
pling BGT # uniil it is sampled active. 


BGT# is a “Don’t Care” after it has been recog- 
nized for cycle N and prior to the assertion of 
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CADS# for cycle N+1. In addition, BGT# is a 
“Don’t Care” once a cycle started by CADS# is 
aborted by a snoop, until the cycle is restored by the 
re-issueing of CADS #. 


7.1.3 RELATION TO OTHER SIGNALS 
When implementing BGT # in the MBC the following 
rules should be used: 


1. BGT# must follow every assertion of CADS#, 
unless the cycle is aborted due to a snoop. 


2. It must preceed CRDY # (for line fills and alloca- 
tions BGT # must preceed CRDY # by at least 3 
CLKS). 


3. In addition BGT # must be asserted with or be- 
fore the assertion of KWEND# and SWEND#. 


4. BGT# must be asserted with or before the asser- 
tion of BRDY # by the MBC. 


5. BGT# is not required following the assertion of 
SNPADS #. 


6. BGT# must be asserted with or before MEOC# 
is asserted. 


7.2 BLE# 


BE Latch Enable 


Controls latching of i860 XP CPU’s byte enable and 
cycle attribute signals 


Output of 82495xXP (pin 016) 
Synchronous to CLK 


7.2.1 SIGNAL DESCRIPTION 


BLE # is used to control the enable line of an exter- 
nal latch (clock edge triggered ’377 type). This latch 
is used to capture the i860 XP CPU’s byte enables 


(BEO#-BE7#) and CPU cycle attribute signals — 


which do not go through the 82495xXP. The 82495XP 
manages the opening and closing of this latch: when 
BLE # is active, new values from the CPU enter the 
latch at each rising edge of CLK. 


The 82495xXP latches the byte enables after ADS # 
of a memory bus bound cycle. It relatches this infor- 
mation with CRDY # or CNA# of that cycle if anoth- 
er cycle is pending. 


7.2.2 WHEN DRIVEN 


The 82495xXP latches the BE latch signals 1 clock 
after ADS # of a memory-bound cycle. Thus latched 


BEO#-BE/7# are valid with CADS#. The 82495xXP 


opens, then closes this latch if a cycle is pending 
and CNA# or CRDY# is asserted. Thus latched 
BEO#-BE7# are valid two clocks after CNA# or 
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CRDY #, which is one clock AFTER CADS# for 
back-to-back cycles. The signals latched in the BE 
latch are only valid for CPU generated memory bus 
cycles (ie, not a 82495XP generated writeback or 
allocation). 


7.2.3 RELATION TO OTHER SIGNALS 


The following CPU signals must be latched in the BE 
latch: 


BEO #-BE7# CACHE # 


CTYP 
LEN PCD ~ PCYC 
PWT 


All other signals in the 82495XP to CPU interface 
(listed in sec. 7.0.2) must be connected only be- 
tween the i860 XP CPU and the 82495xXP. 


7.3 BRDY# 


Burst Ready » 


Memory. Bus Controller Burst Ready input to 
82495xXP, 82490XP, and i860 XP CPU 


Input to 82495XP and 82490XP (82495XP pin P14, 
82490XP pin 60) Cycle Progress Signal 


Input to i860 XP CPU (BRDY2+4, pin U1) 
Synchronous to CLK 


7.3.1 SIGNAL DESCRIPTION 


The BRDY# input to both the 82495XP and 
82490XP must be connected to the BRDY# signal 
which the MBC is providing to the i860 XP CPU’s 
BRDY2# pin. The signal is used by the 82495xP for 
burst tracking purposes. In the 82490XP, it incre- 
ments the CPU latch burst counter. 


During CPU read cycles, BRDY # aliGwait the next 32 
or 64-bit slice of read data to be available at the 
82490XP’s CDATA outputs (CPU bus) by advancing 
the CPU latch burst counter. At the same time, 
BRDY # is latching the previous slice of data into the 
i860 XP CPU. Refer to chapter 6 for more details. 


During CPU write cycles, BRDY# is used to latch 
each slice of write data into the CPU latches and - 
advance the latch counter. 


During CPU special and I/O cycles (which are not 
posted) BRDY # is used to end the cycle. . 


BRDY # must not be asserted until the bus is grant- 
ed (BGT # asserted) and until the data path is ready 
for transferring (CDTS# is asserted). 
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7.3.2 WHEN SAMPLED 


BRDY # is sampled by the CPU, the B2495XP, and 
the 82490XP at every CLK edge. It must. always 
meet proper setup and hold times to CLK. Even 
though the CPU latch may not be in use, BRDY # 
assertion will still advance the latch counter. 


7.3.3 RELATION TO OTHER SIGNALS 
BRDY # controls the CPU and 82490XP CPU latch- 
es. BRDY # has the following implication rules: 


1. The last BRDY # for cycle N must be asserted 2 
clocks before MEOC# for cycle N+ 1. 


2. BRDY# > BGT# 
3. BRDY# > CDTS#_ 


7.4 C490LDRV 


82490XP Low Drive Buffer 

Selects the 82495XP low capacitance driving buffers 

_ Input to 82495xXP (pin M3) Configuration Signal — 
Synchronous to CLK 


7.4.1 SIGNAL DESCRIPTION 


C490LDRV selects the driving strcnaith of : the 
82495xXP buffers that interface to the 82490XP. Re- 
fer to the layout specifications for information how 
C490LDRV should be connected. 


7.4.2, WHEN SAMPLED ~ 


CA90LDRV i isa configuration input sampled like Fig- 
ure 7-1. C490LDRV requires a setup time of 4 CPU 
clocks. After sampling, C490LDRV is a “don’t care” 
until it is sampled as the BGT # Pl after the first 
CADS # assertion. 
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7.4.3 RELATION OT OTHER SIGNALS 
C490LDRV shares a pin with BGT #. 


7.5 CADS# 


Cache Address Strobe 

Indicates beginning of cache cycle | 

Output from 82495XP (pin E3) Cycle Control Signal 
Synchronous to CLK 


7.5.1. SIGNAL DESCRIPTION 


CADS# requests the execution of a memory bus 
cycle to the MBC, and indicates that the cycle attri- 
butes (ie. CD/C#, CM/IO#, CW/R#, PALLC#, 
etc.) are valid. 


If the 82495XP receives a snoop hit to an [M] state 
line. before BGT # is asserted by the MBC, the cur- 
rent CADS# is aborted and reissued after the snoop 


has completed. If the current line (issued by the — 


stalled CADS#) is invalidated by the snoop, then 
that CADS# is cancelled ( ie. will not be reissued 
after the snoop is completed). | 


CADS # isa aaah signal. 


7.5.2 WHEN DRIVEN 

CADS# is asserted by the 82495XP for exactly one 
CLK, and is always a valid logic level. 

7.5.3 RELATION TO OTHER SIGNALS | 


CADS#, when asserted, indicates that the cache 
cycle control and attribute signals (ex. CD/C#, 
NENE#, CW/R#, etc.) are valid. 


Since allocations do not require BRDY#s to the 
CPU, the CDTS# of an allocation cycle will always 
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occur with CADS# of the allocation. In normal cy- 

cles the 82495XP will generate CADS # followed by 
CDTS#. | 

CADS# == CDTS# for all write-through cycles. 
Once CADS# is active, PALLC#, CWAY, CDTS#, 
and BUS# are valid. Address and cycle specifica- 
tion signals (MSETO—-MSET10, MTAGO-MTAG11, 
MCFAO-MCFA6, CW/R#, CM/IO#, CD/C#, 
RDYSRC, MCACHE#, NENE#, SMLN#, KLOCK#, 
and CPLOCK #) will be valid with CADS# active as 
well. 


Every CADS# initiated cycle requires a BGT# and 
CRDY # input from the MBC. 


CADS# and SNPADS# will never be asserted on 
the same CLK. 


7.6 CAHOLD 


82495XP AHOLD Output 

Self-test result and AHOLD output status 
Output of 82495XP (pin G4) Test Signal 
Synchronous to CLK | 


7.6.1 SIGNAL DESCRIPTION 


CAHOLD has two functions. One, it indicates the re- 
sult of the built-in self-tests of the 82495XP. Two, it 
represents the 82495XP AHOLD into the i860 XP 
CPU. 


The 82495XP drives CAHOLD after the 82495XP 
self-tests have completed. CAHOLD should be 
latched when FSIOUT # goes inactive after reset. If 
CAHOLD is high, the self-tests have passed, other- 
wise they have failed. 


When the 82495XP drives AHOLD to the i860 XP 


CPU, it also drives CAHOLD, thus providing a means 


of tracking inquire cycles and back invalidations for. 


performance monitoring. 


7.6.2 WHEN DRIVEN 


CAHOLD is always at a valid logic level. During self- 
test, CAHOLD is held until the clock edge that FSI- 
OUT # is sampled inactive. After self-test, or reset, 
CAHOLD is asserted whenever the 82495XP as- 
serts AHOLD. 
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7.6.3 RELATION TO OTHER SIGNALS 


CAHOLD reflects the value of AHOLD except during 
self-test. During self-test, the value of CAHOLD 
should be latched with the falling edge of FSIOUT # 
to determine pass/fail. 


7.7 CD/C# 


Cache Data/Code 

Indicates whether current cycle is Code or Data 
Output from 82495XP (pin D3) Cycle Control Signal 
Synchronous to CLK | 


7.7.1 SIGNAL DESCRIPTION 


CD/C#, along with CW/R# and CM/IO#, is a 
82495xXP cycle definition signal. It indicates the type 
of bus cycle being requested of the MBC. CD/C# 
can be pipelined by the memory bus controller (by 
using the CNA # input to ine coibeaeen 


7.7.2 WHEN DRIVEN 


CD/C# is valid in the same CLK as CADS# and 
remains valid until CRDY # or CNA#. C/DC# is al- 
ways a valid logic level. | | 


7.7.3 RELATION To OTHER SIGNALS 


Address and cycle specification. signals (MSETO- 
MSET10, MTAGO-MTAG11, MCFAO-MCFA6, 
CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE #, 
NENE#, SMLN#, KLOCK#, and CPLOCK #) will be 
valid with CADS #. | 


7.8 CDATAO- CDATA? 


CPU Data Bus Connection 
Data Bus Connection from 82490XP to CPU | 


Input/Output to 82490XP (pins 48, 54, 49, 55, 46, 
51, 52, 57) © 


— Isolated Interface 
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7.8.1 SIGNAL DESCRIPTION 


CDATAO-7 is the 82490XP data bus connection to 
the CPU. All or part of these 8 pins will be used in 
connecting the 82490XP to the CPU depending on 
the cache configuration. See layout information for 
details. 7 


7.9 CDTS# 


Cache Data Strobe | 

Indicates availability of CPU aatu/dais bus 

Output from 82495XP (pin F4) Cycle Control Signal 
Synchronous to aa 


7.9.1 SIGNAL DESCRIPTION 


For read cycles, CDTS#, when asserted, indicates 
that in the next CPU ciock the data bus path is avail- 


able. This is the earliest time in which BRDY#. may | 


be supplied to the CPU. For CPU initiated write cy- 
cles, it indicates that the data is available on the 
memory bus. For i860 XP CPU inquire cycles, 


CDTS# informs the MBC that the last piece of in-. 


quire data is valid on the CPU bus. 


Usage of this signal allows complete independence 
between address strobes (CADS# and SNPADS #) 
and data strobe. CDTS# allows the 82495xXP to sig- 
nal the MBC that a new cycle has begun as soon as 
addresses are available. This allows memory bus cy- 
cles to start before data i is ready to be genmakell 


CDTS# i isa ca sires e signal. 


Config Line © _Lines/ _ No. of 
No. Ratio Sector = 
1 1 1 


aa 
A 
A 
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7.9.2 WHEN DRIVEN 


CDTS# is assorted for one CLK, at the same time or 
later than CADS # for any given cycle. 


7.9.3 RELATION TO OTHER SIGNALS 


When the MBC samples CDTS#_ asserted, it can 
begin providing BRDY#s for the read cycle to the 
CPU in the next CLK. CDTS#: must always be as- 
serted before CRDY # and must be asserted prior to 
the first BRDY #. 


The CDTS# of an allocation will always occur with 
CADS# of the allocation. In normal cycles the. 
82495XP will generate CDOTS # following CADS#. . 


CDTS# will be asserted at least one CLK after 
eenon 


7.10 CFGO-CFG2 


Configuration Pins 
Determine Cache Characteristics 


Input to 82495XP (pins L4, Q1, M4.) Configuration 
Signals 


_ Synchronous to CLK 


7.10.1 ‘SIGNAL DESCRIPTION 


CFGO- CFG2 are the 3 cache configuiration inputs 
that determine cache characteristics such as line ra- 


_ tio, tag size, and lines per sector. During RESET, this 


information is passed on to the 82490XPs. The fol- 
lowing table maps CFGO-CFG2 to their respective 
configurations for the i860 XP CPU: 


2-300 


7.10.2 WHEN SAMPLED 


CFGO-CFG2 are sampled like Figure 7-1 with a set- 
up time of at least 10 CPU clocks. After sampling, 
CFGO, CFG1, and CFG2 become cycle progress in- 
put signals to the 82495XP and are sampled after 
CADS # of the first cycle. 


7.10.3 RELATION TO OTHER SIGNALS 


CFGO shares a pin with CNA#, CFG1 shares a pin 
with SWEND#, and CFG2 shares a pin with 
KWEND #. 


7.11 CLK 


i860 XP CPU, 82495XP, 82490XP Clock 
Input to the 82495XP (D171) | 


7.11.1 SIGNAL DESCRIPTION 


The CLK input determines the execution rate and 
timing of the 82495XP, 82490XP, and CPU. Pin tim- 
ings are specified relative to the rising edge of this 
signal. The i860 XP CPU, 82495XP, and 82490XP 
requires TTL levels on CLK for proper operation. 


7.12 CM/IO# 


Cache Memory/lO 7 
Indicates whether current cycle is Memory or IO 
Output from 82495XP (D4) Cycle Control Signal 
Synchronous to CLK 


7.12.1 SIGNAL DESCRIPTION 


CM/IO#, along with CW/R# and CD/C#, is a 
82495XP cycle definition signal. It indicates the type 
of bus cycle being requested of the MBC. CM/lIO# 
can be pipelined by the memory bus controller 
(CNA# input to the 82495XP). 


7.12.2 WHEN DRIVEN 


CM/IO# is valid in the same CLK as CADS#, and 
remains active until CRDY # or CNA#. 


7.12.3 RELATION TO OTHER SIGNALS 


Address and cycle specification signals (MSETO- 
MSET10, MTAGO—MTAG11, MCFAO-—MCFA6, CW/ 
R#, CM/lO#, CD/C#, RDYSRC, MCACHE#, 
NENE#, SMLN#, KLOCK#, and CPLOCK #) will be 
valid with CADS# assertion. 
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7.13 CNA#[CFGO] 


82495XP Next Address Enable 

Dynamically pipelines CADS# cycles 

Input to 82495XP (pin L4) Cycle Progress Signal 
Synchronous to CLK | | 


7.13.1 SIGNAL DESCRIPTION 


CNA# is used by the MBC to dynamically pipeline 
CADS# cycles. When active it indicates to the 
82495xXP that the next MBC request can be started. 
Only one level of pipelining is allowed in the 
82495xXP. 


CNA# is an optional input for all cycles initiated with 
CADS #. 


7.13.2 WHEN SAMPLED 


CNA# is sampled starting in the first CLK in which 
BGT# is sampled active until CRDY# is sampled 
active. CNA# is then ignored until the BGT # of the 
next cycle. 


CNA# is ignored during snoop write-back cycles. 


7.13.3 RELATION TO OTHER SIGNALS 


Once the 82495XP samples this signal active, it is- 
sues the CADS# for.the next memory bus cycle as 
soon as one begins. | 


CNA# is recognized between BGT # and CRDY # 
or CDTS# and CRDY # of a given cycle. 


7.14 CRDY # 


Cache Ready 
Ends a cycle in the 82495XP/82490XP 


Input to 82495XP and 82490xXP (pins M2, 43) Cycle 
Progress Signal 


Synchronous to CLK 


7.14.1 SIGNAL DESCRIPTION _ 


CRDY# is used by the 82495XP and 82490XP to 
end a memory bus cycle. CRDY # indicates full com- 
pletion of the cycle and_= allows the 
82495XP/82490XP to free internal resources for the 
next cycle. In the 82490XP, this means that the cur- 
rent memory buffer in use is emptied (put in array, 
discarded, etc). In the 82495XP, CRDY # assertion 
allows 82495XP cycle progress signals (BGT#, 
KWEND#, SWEND#) to be sampled for the next 


. cycle if pipelining is used. 
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CRDY # is required for all 82495XP/82490XP mem- 
ory bus cycles, including snoop cycles. CRDY# 
must be asserted to the 82495XP and 82490XP at 
the same time.. 


7.14.2 WHENSAMPLED | 

CRDY # for a given cycle is ignored until KWEND# 
is returned for that cycle. If KWEND# is not required 
for the cycle, CRDY # is ignored until BGT#. When 


CRDY # is ignored, it may wee Setup and hold 
times. | 


7.14.3 RELATION TO OTHER SIGNALS 


CRDY# must be sampled by the 82495XP and 
82490XP at the same time. For the 82495xXP, 
CRDY # has many cycle implication rules: 


1. CRDY# > CDTS# 
2. CRDY# > BGT# 


3. CRDY# > BGT# + 2 clocks if aii is a line-fill 
or allocation 


4. CRDY# > KWEND# ff cycle ea lines or write- 
through with potential allocation (PALLC# = 0) 


For the 82490XP, CRDY# has three basic rules: 


1. MEOC# for cycle N must be sampled with or be- 
fore CRDY # for cycle N. ; 


2. MEOC# for cycle N+ 1 must be sumed at eae 


2 CPU clocks after CRDY # for'cycle N. 


3. CRDY# for cycle N+14 must. be after the last 
BRDY# for cycle N. _ 


MBRDY # fills the current 82490XP memory buffer. 
CRDY # emties this buffer and makes it available for 
new cycles. CRDY # may be asserted on the same 
clock as MEOC# which may be asserted on the 
. same clock as MBRDY #. | 


CRDY # shares a pin with SLFTST #. 
7.15 CWAY 

Cache Way . 

Indicates WAY used by the current cycle 


Output from 82495XP (pin J3) Cycle Control Signal 
ane to CLK | 
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7.15.1 SIGNAL DESCRIPTION 


CWAY is a cycle definition signal which indicates to 
the MBC the WAY used by the requested cycle. On 
line-fills it indicates the way the line will be loaded. — 
For write-hits (to [S] state or LOCKed) it indicates 
the way which was a hit. For write-backs it indicates 
the way that was written-back. 


CWAY is utilized by external tracking machines in 
order for the 82495XP tags to be accurately dupli- 


cated. 


7.15.2 WHEN DRIVEN 


CWAY is valid together with CADS# and remains 
valid until CRDY # or CNA#. 


7.15.3 RELATION TO OTHER SIGNALS 
CWAY is valid with CADS #. 


7.16 CW/R# 


Cache Write/Read 

Indicates whether current cycle is write or read 
Output from 82495xXP (pin E4) Cycle Control Signal 
Synchronous to CLK 


7.16.1 SIGNAL DESCRIPTION 


CW/R#, along with CD/C# and CM/IO#, is a 
82495xXP cycle definition signal. It indicates the type 
of bus cycle being requested of the MBC. CW/R# 
can be pipelined by the memory bus controller 
(CNA# input to the 82495xXP). 


‘ 


7.16.2 WHEN DRIVEN 

CW/R# is valid in the same CLK as CADS# and is 
valid until CRDY # or CNA#. 

7.16.3 RELATION TO OTHER SIGNALS 


Address and cycle specification signals (MSETO- 
MSET10, MTAGO-MTAG11, MCFAOQ-—MCFA6, 


~ CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE #, 


NENE#, SMLN#, KLOCK#, and ee, will be © 
valid with CADS#. 
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7.17 DRCTM# 


Memory Bus Direct to [M] State 


Signals 82495XP to tag data direct to the [M] state, 
skipping the [E] and [S] states. 


Input to the 82495XP (pin M1) Cycle Attribute Signal 
Synchronous to CLK 


7.17.1 SIGNAL DESCRIPTION 


DRCTM # is an input to the 82495XP from the mem- 
ory bus. When sampled active at the end of the 
snooping window (SWEND¥ activation), the 
82495XP moves the line fill in progress directly to 
the [M] state. 


There are three cases in which this is useful. 
1. Simplifies External State Tracker 


External trackers can only track the [M], [S], and 
[I] states. The [E] state can not be tracked exter- 
nally since cache write hits internally change [E] 
state lines to [M] state. DRCTM# can be used to 
eliminate the [E] state from the MESI protocol. 


2. Read For Ownership 


During a write miss with allocation the write may 
go to the memory buffer and not be written to 
memory. A read from memory, in conjunction with 
the MFRZ# signal asserted, reads the data to fill 
around the bytes written by the CPU. The con- 
tents of the memory buffer are then entered into 
the cache. The cache would normally tag this 
data in the [E] state (The cache assumes the 
write went to main memory). The system has the 
option of never completing the write to memroy 
(increases performance by completing the alloca- 
tion quicker). If the write is not performed to 
memory, the cache is the only owner of the new 
data and therefore the cache entry must be 
tagged to the [M] state. 


3. Cache to Cache Transfer 


A cache to cache transfer may occur as a result 
of a snoop. For example, if CPU/Cache 1 per- 
forms a read from main memory and CPU/Cache 
2 flags it as a snoop hit to an [M] state line. To 
expedite the transfer, the system may perform 
the writeback from CPU/Cache 2 directly to 
‘CPU/Cache 1, bypassing memory. CPU/Cache 1 
assumes the write-back went to memory and 
would normally tag the line to the [S] state. Since 
the system did not perform the write to memory, 
the system should drive DRCTM# to force the 
line to the [M] state. In addition, the line should 
be invalidated in CPU/Cache 2 by driving 
SNPINV. | 
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7.17.2 WHEN SAMPLED 


DRCTM# is synchronous to CLK. It is only sampled 
when SWEND# is active (the end of the snooping 
window). When SWEND# is inactive DRCTM# is 
ignored and does not have to meet Setup and hold 
times. 


7.17.3 RELATION TO OTHER SIGNALS 


DRCTM# (direct to [M]) and MWB/WT # (write poli- 
cy) combine to define the memory bus attributes and 
are sampled on CLK at the end of the snooping win- 
dow (SWEND ¥ activation). 


lf MRO# is sampled active during KWEND#, 
DRCTM # is ignored. | 


7.18 FLUSH # 


Flush 
Causes a 82495XP Cache Flush 


Input to 82495XP (N4) Cache Synchronization ee 
nal 


Asynchronous input 


7.18.1 SIGNAL DESCRIPTION 


This signal causes the 82495xP to flush all its modi- 
fied lines to main memory. The flushing of modified 
lines require the 82495XP to perform back-invalida- 
tion and inquire cycles to the CPU. At the end of 
flush, the 82495xXP tag array will be completely inval- 
idated. 


FLUSH # will invalidate the entire 82495XP tag ar- 
ray. It takes two clocks to look-up and invalidate a 
tag entry. The 82495XP will also invalidate tags in 
the CPU cache by running back-invalidation cycles. 
If the 82495XP tag state is modified, the 82495XP 
will run inquire cycles to the i860 XP CPU to see is | 
the line is modified in its cache. If so, the i860 XP 
CPU will write back the line into the 82495xXP write 
buffer. All modified 82495XP cache lines must be 
written to memory. 


7.18.2 WHEN SAMPLED 


FLUSH # can be asserted at any time. The 82495XP 
will complete all outstanding transactions on the 
CPU and memory bus before beginning the 
FLUSH# process. The memory bus controller does 
not have to prevent FLUSH# during locked cycles 
because the 82495XP will complete its locked trans- 
action before the FLUSH # process will begin. 
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Once a FLUSH # operation has begun, the FLUSH # 
signal is ignored until the operation completes. If 
RESET is activated while the FLUSH# operation is 
in progress, the FLUSH# operation will be aborted 
and the RESET immediately executed. 


FLUSH# is an asynchronous input. FLUSH# must 
have a pulse width of 2 CLK’s in order to guarantee 
82495XP recognition. 


7.18.3 RELATION TO OTHER SIGNALS 


To initiate a FLUSH#, the 82495XP will complete all 
pending cycles and prohibit the processor from issu- 
ing any further ADS#’s while the FLUSH# is in 
progress. The FSIOUT # output signal is used to in- 


_. dicate the start and end of the FLUSH # operation. It 


will become active when the FLUSH # signal is inter- 
nally recognized (all outstanding cycles have com- 


pleted) and will de-activate with the CRDY# of the 


last FLUSH# write-back. 


The memory bus controller supplies BRDY # to the 


CPU once FSIOUT# has gone inactive and the. 


FLUSH is complete. Once FLUSH # has begun, and 
FSIOUT# active, all CADS #’s and CRDY #’s corre- 
spond to write-backs caused by the FLUSH # opera 
tion. 


The 82495XP can be snooped during FLUSH# cy- 


cles and the snooping protocols will be the same as 
that for any memory bus cycle. 


7.19 FPFLD# [FPFLDEN] 
External FIFO PFLD 


Indicates PFLD cycle during. external PFLD FIFO 


mode > 
Output of the 82495XP 4) Cycle Control Signal 
Sync to CLK : 


7.19.1 SIGNAL DESCRIPTION 
During RESET, this pin functions as the FPELDEN 


configuration signal. The 82495XP can be config- — 


ured to decode the i860 XP microprocessor’s PFLD 
cycles. The 82495XP supports 3 operational modes 
for PFLD cycle decoding, as defined by FPFLDEN 
and NCPFLD#: 


Mode #1. PFLD cycles are cached in the 82495XP. 


Mode #2. PFLD cycles are not cached in the 
-~ 82495XP, without an external PFLD ex- 
tension FIFO.. 


Mode #3. PFLD cycles not cached in the S2495xXP, 
with an external PFLD extension FIFO. 
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roa FPFLDEN | NCPFLD# 


iene Se a eS 


If mode 3 has been selected, the 82495XP allows 
the PFLD pipeline to be extended with an external 
FIFO. After RESET, when this mode has been se- 
lected, the FPFLD output will indicate that the re- 
quested cycle is a PFLD cycle. See Section 5.2.5 for 


-more details. 


7.19.2 WHEN DRIVEN 


FPFLDEN is sampled on RESET as in figure 7-1, 
with a setup time of 4 CPU clocks. in PFLD mode 
#3, the FPFLD# output is valid in the same CLK as 
CADS # and remains valid until CRDY # or CNA#. 


7.19.3 RELATION TO OTHER SIGNALS | 


Address and cycle specification signals (MSETO- 
MSET10, MTAGO-—-MTAG11, - MCFAO—MCFA6, 
CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE#, 
NENE#, SMLN#, KLOCK#, and eo be 
valid with CADS #. : 


7.20. idinee 


Flush, Sync, Initialization Output 
Indicates the start and end of the Flush, : 
Sync, and Initialization operations. | 


Output of the 82495xP (01) Cache eueleaien 
Signal 


Sync to CLK 


7.20.1 — SIGNAL DESCRIPTION 


This signal indicates the start and the end of either a 


Flush, Sync, or Initialization (including self-test if re- 
quested) operation. These operations are mutually 
exclusive. This signal is activated when the 82495XP 
begins the operation and: goes inactive upon com- 
Bevan of the operation. 


7.20.2 WHEN DRIVEN 
This signal will be. asserted whenever a Flush, Sync, 


or Initialization operation is internally recognized my 
the 82495xXP and is in progress. 
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7.20.3 RELATION TO OTHER SIGNALS 


FSIOUT # active indicates that either Flush, Sync, or 
Initialization operation is in progress. Only one of 
these operations can be run within the 82495XP ata 
time. | 


The table below shows the priorities of these three 
operations: 


Crush | use | 
SYNC# 


If a trigger of higher priority occurs while a lower 
priority operation is running, the lower priority opera- 
tion is aborted and the higher priority one executed. 
If a trigger of lower priority occurs when a higher 
priority one is running, the lower priority trigger is 
ignored. Once a FLUSH# or SYNC# operation has 
begun, its trigger is ignored until the operation com- 
pletes. 


When a higher priority operation aborts a lower prior- 
ity one, FSIOUT # remains active. 


_ Since RESET, FLUSH# and SYNC# are all asyn- 
chronous, FSIOUT# will be activated when the 
82495xXP is actually internally executing the opera- 
tion. 


7.21 HIGHZ# 


‘High Impedance Outputs 

Causes 82495XP outputs to be tristated 
Input to 82495XP (pin P4) Test Signal 
Synchronous to CLK 


7.21.1 SIGNAL DESCRIPTION 


The 82495xXP will enter self-test if both SLFTST # is 
active and HIGHZ# is inactive during reset. If 
SLFTST# is sampled active and HIGHZ# is sam- 
pled active during reset, the 82495XP floats all its 
outputs until the 82495XP is reset again. Activation 
of HIGHZ# without SLFTST # does nothing. 


7.21.2 WHEN SAMPLED 


HIGHZ# is sampled like figure 7-1 with a setup time 
of 10 CPU clocks. HIGHZ # is then a don’t care until 
the 82495XP reset sequence is complete (with FSI- 
OUT # going inactive) where it becomes the MBALE 
pin. 
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7.21.3 RELATION TO OTHER SIGNALS 


HIGHZ# shares a pin with MBALE. 82495XP out- 
puts are tristated if both HIGHZ# and SLFTST # are 
sampled active during reset. 


7.22 KLOCK# 


82495XP LOCK# 

Request to MBC of LOCKed cycle 

Output from 82495XP (pin C3) Cycle Control Signal 
Synchronous to CLK | : 


7.22.1 SIGNAL DESCRIPTION 


KLOCK# indicates to the MBC that there is a re- sy 


quest to execute a locked cycle. This signal follows a 
the CPU lock request. 


KLOCK # is simply a one-clock flow-through version 
of the CPU LOCK# signal. The 82495XP will acti- 
vate KLOCK# with CADS# of the first cycle of a 
LOCKed operation and it will remain active until the 
CADS# of the last cycle of the LOCKed operation. 


Note that if the memory bus is pipelined, there may 
be a situation in which KLOCK# deactivation is in 
the same CLK as its new activation (together with 
CADS #). In this case KLOCK# won't go inactive 
between back-to-back locked sequences. KLOCK# | 
will never go inactive if the CPU LOCK# does not go 
inactive. The 82495xXP will not open arbitration win- 
dows between back-to-back locked sequences; it is 
the memory bus controller's responsibility to imple- 
ment this functionality by detecting a LOCKed write 
followed by a LOCKed read. . 


KLOCK# activation is not qualified by the tag array 
look-up (hit/miss indications); therefore, KLOCK# 
can be active before CADS# is asserted. 


7.22.2 WHEN DRIVEN 


KLOCK# assertion is a flow-through of 1 CLK from 
the CPU LOCK# after the 82495XP completes all 
pending cycles. KLOCK# deassertion is a flow- 
through of 1 CLK from the CPU LOCK# signal, and 
must be at least 1 CLK after the last CADS# of a 
LOCKed sequence. KLOCK*# is always driven to a 
valid logic level. = 


7.22.3 RELATION TO OTHER SIGNALS 


| Address and cycle specification signals (MSETO- 


MSET10, MTAGO-MTAG11, MCFAO—MCFA6, CW/ 
R#, CM/lIO#, CD/C#, RDYSRC, MCACHE#, 
NENE#, SMLN#, KLOCK#, and CPLOCK #) will be 
valid with CADS#. ? 
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7.23 KWEND¢. 


Cacheability Window End 

Closes 82495XP Cacheability Window 

Input to 82495XP (pin M4) Cycle Progress Signal 
Synchronous to CLK 


7.23.1 SIGNAL DESCRIPTION 


KWEND # is a cycle progress input to the 82495XP 
that, when active, closes the cacheability window 
and causes the cacheability attributes MKEN# and 
MRO# to be sampled. 


KWEND# is sampled by the 82495xXP after BGT # 

has been sampled active. KWEND# should be as- 

serted by the MBC once the memory address has 

been decoded and cacheability (MKEN #) and read- 
only ee, attrihutes have heen determined, 


The sampling of KWEND# active allows SWEND# 
to be. sampled. Resolving KWEND# quickly allows 
the non-cacheable window between BGT#. and 
SWEND# to be closed more quickly. KWEND# ac- 


tivation also allows the 82495XP 2 start Uallpeations : 


and begin replacements. 


7. 23.2 WHEN SAMPLED 


KWEND # is sampled by the 82495XP on the clock, 
or after, BGT# has been sampled active. Once 
KWEND ¢ is sampled active it is not sampled again 
until BGT# of the next cycle. KWEND# need not 


| follow setup and hold times if itis. not being sampled. 


BGT#, KWEND# and SWEND# may be asserted 
on the same clock edge. 7 


KWEND# need only be acivaicd for those eyelee 
which require the sampling of MKEN# and MRO#. 
These are line-fills and write cycles wim =polontiel 
allocation. | 


7.23.3 RELATION TO OTHER SIGNALS | 


KWEND # is sampled on or.after BGT #° and allows 
the sampling of SWEND#. KWEND#¥ activation 
causes the sampling of MKEN# and MRO#. 


According ‘to cycle progress implication | rules, 
CRDY # must be at least one clock after KWEND # 
for line fills and write-through cycles with’ potential 
allocate. 


KWEND# eucat ith CFG2, 
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7.24 MALE 


Memory Address Latch Enable 
Tristates/Enables Memory Address Outputs © 
Input to 82495XP (pin 02) Cycle Control Signal - 
Asynchronous 


7.24.1 SIGNAL DESCRIPTION 


The 82495XP contains an address latch which con- 


— trols the last stage of the 82495XP address output. It 


is controlled by four signals: MAOE#, MBAOE#, 
MALE, and MBALE. The signals MALE and MBALE 
control the latching of the entire 82495XP address 
where MBALE controls the subline portion and 
MALE controls the rest. 


MALE is provided so that the memory bus controller 


alinad aAa ale 
can contro! when the next pipelined address is driv- 


en. With MALE high, the 82495XP address latch is in 
‘flow-through’ mode and the 82495XP address is 
available at: the memory bus. Changes in the 
82495XP address are seen immediately at the mem- 
ory bus. When MALE is driven low the address at 
the latch input is latched. Any subsequent address 
driven by the 82495XP will not be seen at the memo- 
ry bus outputs until MALE is driven high again. . 


MALE will latch 82495XP addresses regardless of 


_ the state of MAOE#. If MAOE# is inactive, MALE 


will still operate the latch Properly: but the memory 
bus will be tristated. 
7.24.2 WHEN SAMPLED 


MALE is asynchronous and can be reer and 
deasserted at any time. MALE should always be 
driven to a valid state since it directly controls the 
operation of the address latch. 


7.24.3 RELATION TO OTHER SIGNALS 
MALE together with MBALE control the latching of 


_ the entire 82495XP output address. The other latch 


control signals, MAOE # and MBAOE #, provide the 
memory bus controller complete command over the 
address outputs. MAOE# and MBAOE# do not af- 
fect the operation of MALE or MBALE. | | 


MALE shares a pin with the WWOR¢ configuration 
pin. 
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7.25 NMAOE# 


Memory Address Output Enable 
Tristates/Enables Memory Address Outputs 
Input to 82495XP (pin S4) Cycle Control Signal 
Asynchronous except during snoop cycles 


7.25.1 SIGNAL DESCRIPTION 


The 82495XP has an address latch which is con- 
trolled by a latch input, MALE, and an output enable 
input, MAOE#. MAOE# has two main functions. 
One, driving MAOE # active will enable the 82495XP 
to drive it’s address lines MTAGO-—11, MSETO—10, 
and MCFAO-6. Two, MAOE # is a qualifier for snoop 
cycles and must be inactive for the 82495XP to 
Snoop. | 


In general, MAOE# should be active if its 82495XP 
is the current bus master. When that 82495XP gives 
up the bus, MAOE# should be inactive to float the 
address lines and allow another master to snoop. 


MAOE# controls the output of the 82495XP ad- 
dress except the subline (burst) portion. This portion 
has a separate output control: MBAOE#. 


7.25.2 WHEN SAMPLED 


MAOE# is an asynchronous input (except during 
snoop cycles) and always has full control over the 
address output. For this reason, MAOE# must al- 
ways be driven toa valid state. | 


The 82495xXP does, however, sample MAOE # dur- 
ing snoop cycles. When sampled, MAOE# must 
meet proper setup and hold times. In synchronous 
snoop mode MAOE # is sampled on a CLK edge. In 
clocked mode MAOE# is sampled on a SNPCLK 
edge. In strobed mode MAOE # is sampled with the 
falling edge of SNPSTB #. If MAOE# is sampled ac- 
tive, the snoop will be ignored. This allows 
SNPSTB# to share a common line for multiple 
82495XPs. ~ : vs 


MAOE # need not meet any setup or hold time if it is 
not being sampled during a snoop cycle. 


7.25.3 RELATION TO OTHER SIGNALS 


MAOE # together with MBAOE# control the entire 
82495XP address. Both signals are asynchronous 
and thus need never be synchronized to any clock. 
Both signals are, however, sampled during snoop 
cycles and require proper setup and hold times in 
these situations. | 
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MALE and MAOE# together provide full control 
over the 82495XP address output latch. 


7.26 NBALE 


Memory Burst Address Latch Enable 
Tristates/Enables Memory Burst Address Outputs 
Input to 82495XP (pin P4) Cycle Control Signal 
Asynchronous | 


7.26.1 SIGNAL DESCRIPTION 


The 82495xXP address latch is controlled by four sig- 
nals: MAOE#, MBAOE#, MALE, and MBALE. The 


signals MALE and MBALE control the latching of the (gam 
entire 82495XP address where MBALE controls the @ 


subline portion and MALE controls the rest. 


MALE and MBALE are provided so that the memory 
bus controller has complete flexibility when the next 
address is driven. With MBALE high, the subline por- 


‘tion of the 82495XP address latch is in ‘“flow- 


through” mode and the 82495XP subline address is 
available at the memory bus. Changes in the 
82495XP subline address are seen immediately at 
the memory bus. When MBALE is driven low the 
subline address at the latch input is latched. Any 
subsequent subline address driven by the 82495XP 
will not be seen at the memory bus outputs until 
MBALE is driven high again. _ _ 4 


MBALE will latch 82495XP addresses regardless of 
the state of MAOE# or MBAOE#. If MBAOE# is 
inactive, MBALE will still operate the latch properly, 
but the subline portion of the memory bus will be 
tristated. 


Separate line and subline address latch controls are 
provided so that the latch outputs may be driven at 
different times. The table below indicates the subline 
address bits for each line size. 


Line Size (Bytes) 


Subline Address 
A3, A4 

A4,A5 — 
AS, AG 


7.26.2 WHEN SAMPLED 


MBALE is asynchronous and can be asserted and 
deasserted at any time. MBALE should always be 
driven to a valid state since it directly controls the 
operation of the address latch. ~ 
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7.26.3 RELATION TO OTHER SIGNALS 


MALE together with MBALE control the latching of 
the entire 82495XP output address. The other latch 
control signals, MAOE# and MBAOE #, provide the 
memory bus controller complete command over the 
address outputs. MAOE# and MBAOE# do not af- 
fect the operation of MALE or MBALE. 


MBALE shares a pin with the HIGHZ# configuration 
pin. 


7.27 MBAOE # 


Memory Burst Address Output Enable 

_ Tristates/Enables Memory Subline Address Outputs 
- Input to 82495XP (pin P6) Cycle Control Signal 

_ Asynchronous except during snoop cycles 


7.27.1 SIGNAL DESCRIPTION . 


The 82495xXP address latch is controlled by four sig- 
nals: MAOE#, MBAOE#, MALE, and MBALE. 
MAOE# and MBAOE# are the output enables of 
this latch for the entire 82495XP address. Specifical- 
ly, MBAOE# controls the subline address portion 
and MAOE # controls the rest. | 


MBAOE # has two functions. One, it can tristate the 
subline portion of the address separately from the 
rest of the address. Since the 82495XP does not 
sequence through burst addresses, the memory sys- 
tem may wish to provide the burst count. This re- 
quires that the 82495XP address burst portion be 
tristated after the first transfer. The Subline Address 
table appears in Section 7.26, MBALE. 


Two, MBAOE# is sampled during snoop cycles. If 
MBAOE # is sampled inactive, the snoop write back 
cycle, if any, will begin at the subline address provid- 
ed. If MBAOE# is sampled active, the snoop write 
back will begin at subline address 0. This. allows 
snoop write backs to begin at the snooped subline 
address and progress through the normal burst or- 
der. 


7.27.2 WHEN SAMPLED 


Like MAOE#, MBAOE# is asynchronous except 
during snoop cycles and can be asserted or deas- 
serted at any time. Since MBAOE # has direct con- 
trol over the address latch, it must always be driven 
toa valid state. 


MBAOE# is ,however, sampled during snoop cy- 
cles. In synchronous snooping mode, MBAOE# 
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must meet proper setup and hold times to CLK’s 
rising edge. In clocked mode, MBAOE# must meet 
setup and hold times to SNPCLK’s rising edge. In 
strobed mode, MBAOE# must meet setup and hold 
times to SNPSTB#" s falling edge. 


lf MBAOE# is not or sampled for a snoop, ie. 
SNPSTB# is not asserted, MBAOE # need not meet 
any setup or hold time. 


7.27.3 RELATION TO OTHER SIGNALS ~ 


MAOE # and MBAOE# control the entire 82495XP 
address output asynchronously. This address latch 


~ is completely controlled by MALE, Meni eee 


and MBAOE #. 


MBAOE# is only sampled by the 82495XP during 
snoop cycles with SNPSTB #. 


7.28 MBRDY# 


Memory Burst Ready 
Burst Ready input to 82490XP memory buffers 
Input to 82490XP (pin 22) Cycle Progress Signal 


- Synchronous to MCLK 


7.28.1 SIGNAL DESCRIPTION 


When in clocked memory bus mode, MBRDY # (with 
MSEL# active) is used to advance the memory 
burst counter for the 82490XP buffer in use. This 
causes either new data to be latched from the mem- 
ory bus (read cycle), or new data to be driven from 
the 82490XP buffer (write cycle). MBRDY # is sam- 
pled on all MCLK edges in which MSEL# is sampled 
active and has no relation to CLK. In strobed mode, 
MBRDY# must be tied high as MISTB/MOSTB 


_ strobes data in/out of the 82490XP. 


For write cycles, the first piece of write data is avail- 
able at the MDATA pins. MBRDY# assertion with 
MSEL# active causes the next 32, 64, or 128-bit 
slice of write data to be available. If only one slice is 
required, MSEL# and MBRDY# need never go ac- 


. tive. 


For read cycles, the first piece of read data flows 
through to the CPU. MBRDY# assertion with 
MSEL# active causes the next slice of memory data 


to be latched in the 82490XP buffer. BRDY # asser- 


tion will allow this data to be available on the CPU 
bus and latch it into the CPU. For cacheable cycles, 
MBRDY# needs to be asserted 4 or 8 times -de- 
pending on the cache eOnMguration:. 
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7.28.2 WHEN SAMPLED 


MBRDY# is sampled on all MCLK edges where 

MSEL # is sampled active. In this way MSEL# quali- 

fies the MBRDY # input. lf MSEL# is sampled inac- 

_ tive, MBRDY # need not follow setup and hold times 
to MCLK. 


7.28.3 RELATION TO OTHER SIGNALS 


MBRDY# is qualified by the MSEL#_ input. 
MBRDY # advances the memory burst counter for 
the 82490XP in use which either inputs or outputs 
data through MDATA. 


MEOC# switches the 82490XP buffers to the next 
pending cycle, so the last MBRDY # must come be- 
fore or on the clock of MEOC# assertion. 


7.29 WICACHE # 


82495xP Internal Cacheability 

Indicates cycle cacheability attribute 

Output from 82495XP (pin C2) Cycle Control yeelgne 
Synchronous to CLK 


7.29.1 SIGNAL DESCRIPTION 


- MCACHE # is driven by the 82495xXP and indicates 
that the current cycle may be cached. Data cachea- 
bility is determined later in the cycle by MKEN# as- 
sertion. MCACHE# is asserted for allocation, re- 
placement write-back cycles, and during cacheable 
read-miss cycles. (ie. read-miss cycles in which PCD 
is not asserted). It is not asserted for 1O, special, or 
locked cycles. 


[write Backs ‘| i 
[Reaseco=0 | 
er 
wee ee 
ee 


| Allocation | 
I/O Cycles 
| Locked Cycles 


7.29.2 WHEN DRIVEN 


MCACHE # is valid in the same CLK as CADS# and 
remains valid until CRDY # or CNA#. 


With MAOE#/MBAOE¥ | active, 
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7.29.3 RELATION TO OTHER SIGNALS 


Address and cycle specification signals (MSETO- 
MSET10, MTAGO-MTAG11, MCFAO-—MCFA6, 
CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE#, 
NENE#, SMLN#, KLOCK#, and CPLOCK #) will be 
valid with CADS #. 


7.30 NCFAO-MCFAG. 
MSETO-MSET 10 
MTAGO-NMTAG11 


MCFAO—MCFA6 Memory Configuration Address !/O 
MSETO-—MSET10 Memory Set Address |/O 
MTAGO—MTAG11 Memory Tag Address I/O 
82495XP Memory Address Inputs/Outputs 


Input/Output of 82495XP (pins N14, P7-P15, O6- fe 


O16, R4, R14-R17, S$14-S17) Cycle Control Sig- 
nals | 


Input Synchronous to CLK, SNPCLK, or SNPSTB#. 
Output from CLK, MAOE# active or MALE high. 


7.30.1 SIGNAL DESCRIPTION 


MSETO-10, MTAGO-11, and MCFAO-6 provide the 
complete 30 bit address input/output interface of 
the 82495XP to the memory bus. Together they 
span the entire CPU address range A2-A31. De- 
pending on the cache configuration, each pin repre- 
sents a different CPU address line (see configura- 
tion section for details). 


MSETO-10, MTAGO-11, and MCFAO-6 pass 
through a 82495XP output latch. The latching of this 
latch is controlled by MALE/MBALE, and the output 
of this latch is controlled by MAOE#/MBAOE#. 


MSET/MTAG/ 
MCFA are 82495XP outputs. They are valid at the 
start of a memory bus cycle at the input of the 
82495XP address latch. If MALE/MBALE is high 
(flow-through) and MAOE#/MBAOE#¥ is active 
(outputs enabled), they are driven to the memory 
bus with CADS#. | 


If a new cycle starts and MALE/MBALE is low, the 
previous address remains valid at the 82495XP 
MSET/MTAG/MCFA outputs. Once MALE/MBALE 


goes high, the new address flows through with the 


appropriate propagation delay (MSET/MTAG/ 
MCFA address valid delay from MALE/MBALE go- 
ing high). The new address will be driven to the 
82495XP MSET/ MTAG/MCFA outputs if MAOE#/ 
MBAOE # is active. 
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lf a new cycle starts, MALE/MBALE is high, and 
MAOE #/MBAOE # is inactive, the 82495XP MSET/ 
MTAG/MCFA outputs will remain tristated. Once 
MAOE #/MBAOE# is asserted, the new address 
flows through with the appropriate propagation delay 
(MSET/MTAG/MCFA address valid from MAOE#/ 
MBAOE # going active). 


MSETO-10, MTAGO-11, and MCFAO-6 are used 
as inputs to the 82495XP during snoop cycles. Here, 
MAOE #/MBAOE¥ _ is _ inactive. 
MCFA are sampled by the 82495XP during snoop 
initiation just like the other snoop attributes. 


7.30.2 WHEN SAMPLED 


If MALE/MBALE is high and MAOE#/MBAOE # is 
low, MSETO-10, MTAGO-11, and MCFAO-6 are 
valid with CADS# with a timing reference to CLK. 
Otherwise, they are asserted with a delay from 
MALE/MBALE high or MAOE #/MBAOE # active. 


MSETO-—10, MTAGO-11, and MCFAO-6 change 
once CNA# or CRDY # is sampled active. MSETO- 
10, MTAGO-11, and MCFAO-6 have a float delay 
from MAOE#/MBAOE #¥ going inactive. These out- 
puts are undefined after CRDY#/CNA#¥ assertion 
and before the next CADS# assertion. 


As inputs during snoop cycles (SNPSTB# asserted), 
they must be sampled like other snoop attributes 
with proper setup and hold times. In synchronous 
snoop mode this is with respect to-CLK; in clocked 
mode, this is with respect to SNPCLK; and in 
strobed mode this is with respect to SNPSTB # fall- 
ing edge. | 


If MAOE # is inactive and SNPSTB # is not asserted 
(no snoop), MSETO-—10, MTAGO-11, and MCFA0O- 
6 need not meet any setup or hold time. 


7.30.3 RELATION TO OTHER SIGNALS 


MSETO-10, MTAGO-11, and MCFAO-6 are assert- 
ed with CADS# so they are valid when CADS# is 
sampled active. This is true as long as MALE/MBA- 
LE is high and MAOE#/MBAOE#¥ is active. If 
MSETO-10, MTAGO-11, and MCFAO-6 have been 
asserted but are blocked by MALE/MBALE or 
MAOE #/MBAOE #, they are asserted from MALE/ 


MBALE going high or MAOE # /MBAOE # going ac- 


tive. 


MSETO-10, MTAGO-11, and MCFAO-6 are deas- 
serted or changed with CADS# or CNA# active. 
They may also be floated with MAOE ¥ going inac- 
tive. 


MSET/MTAG/ 


In clocked memory bus mode, 
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MSETO-10, MTAGO-11, and MCFAO-6 are used 
as inputs during snoop cycles. They are sampled 
with SNPSTB # like any other snoop attribute signal. 


7.31 MCLK 
Memory Bus Clock 
Input to the 82490XP (Pin 26) 


7.31.1 SIGNAL DESCRIPTION. 


Ina clocked memory bus mode, this pin provides the 
memory bus clock. Memory bus signals and memory 
bus data are sampled on the rising edge of MCLK. 
Memory bus write data is driven off MCLK or 
MOCLK depending upon the configuration. MCLK 
has no relation to CLK. | 


7.31.3 RELATION TO OTHER SIGNALS 
MCLK shares a pin with MISTB. 


the MDATA7- 
MDATAO, MSEL#, MFRZ#, MBRDY#, MZBT#, 
and MEOC# pins are sampled synchronously with 
the rising edge of MCLK. In a clocked memory bus 
write, MDATA7—MDATAO are driven ey BeMronous 
with MCLK or MOCLK. | 


MOCLK is a delayed version of MCLK. If a clocked 
memory bus configuration is chosen, and the 
MOCLK rising edge is detected by the 82490XP af- 
ter RESET, data will be driven off of MOCLK rather 


‘then MCLK. Only data is effected by MOCLK. 


MOCLK is used to allow the system designer to in- 
crease the minimum oe time of MDATA relative 
to MCLK. 


7.32 MDATAO-MDATA7 
Memory Bus Data Pins 


82490XP Connection to the Memory Bus 


Input/Output of 82490XP (pins 18, 14, 10, 6, 16, 12, 
8, 4) 


‘Synchronous to CLK or MCLK or MOCLK or MISTB 


or MOSTB. 


7.32.1 SIGNAL DESCRIPTION | 


MDATAO-7 is the 82490XP data bus connection to 
the memory bus. All or part of these pins will be used 
depending on the cache configuration. These pins 
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are directly controlled by the MDOE# input. With 
MDOE# inactive, these pins are tristated and may 
be used as inputs. 


For write cycles, the 82495XP asserts CDTS# to 
indicate that data will be available at the MDATA 
pins or in its buffer. Data is output with respect to 
CLK, MCLK, MOCLK, or MEOC# and is strobed 
with MBRDY #. In strobed memory bus mode, data 
is output using MOSTB. 


For read cycles, CDTS # indicates that the CPU data 
path will be available for read data in the next clock. 
BRDY # reads data into the CPU from the 82490XP. 
Data is read into the 82490XPs iener MDATA us- 
ing MBRDY # or MISTB. 


7.32.2 WHEN DRIVEN 


When the CPU or 82495XP initiates a write cycle, 
the write data is written to the appropriate 82490XP 
buffer and CDTS# is asserted. If MDOE# is active, 
that first piece of write data will be available at the 
MDATA pins with some delay from the CPU CLK 
edge that CDTS# is asserted. Subsequent pieces of 
write data are output with some delay from MCLK or 
MOCLK (mode dependent) from the edge that 
MBRDY # is sampled active. In strobed mode, sub- 
sequent data is output with MOSTB assertion. 


MDATA has no value before CDTS# assertion, after 
MEOC# with no pending cycle, or with MDOE # in- 
active. 


For read cycles, the 82495XP asserts CDTS# the 
clock before the MDATA path is available for read 
data. MDOE# must be inactive for the 82490XP to 
read data. Read data is strobed into the 82490XP by 
asserting MBRDY# on MCLK edges. MEOC# will 
latch the last piece data as it switches buffers. In 
strobed mode, data is read by MISTB. Data that is 


read into MDATA must meet proper setup and hold. 


times. 


Data at the MDATA inputs need not follow setup and 
hold times to MCLK edges that sample MBRDY # 
inactive. 


7.32.3 RELATION TO OTHER SIGNALS 


CDTS# indicates that write data is in the 82490XP 
buffers. If MDOE # is active, write data Is available at 
MDATA some time after CDOTS # or MEOC # is sam- 
pled active. Subsequent write data is available at 
MDATA after MBRDY # assertion or MOSTB chang- 


ing. 


82495xXP Cache Conitroller/82490XP Cache RAM 


PRELIMINARY 


MDOE# must be inactive for MDATA to read data. 
CDTS# assertion by the 82495XP indicates that the 
read path is available in the next clock. Data must be 
read into MDATA with respect to MCLK or MISTB 
and must follow proper setup and hold times if 
MBRDY # is active or MISTB is changing. | 


The memory bus controller must account for the 
large setup time required to read data into the CPU. 
If properly done, data can be read into MDATA by 
asserting MBRDY# and in the next full CPU clock 
read into the CPU using: BRDY #. 


7.33 NIDOE# 


Memory Data Output Enable 
Tristates/Enables Memory Data Outputs: 

Input to 82490XP (pin 20) Cycle Control Signal 
Asynchronous 


7.33.1 SIGNAL DESCRIPTION 


MDOE # is an input to the 82490XP that, when as- 
serted, causes the 82490xXP to drive its MDATAO- 
MDATA7 outputs. When MDOE ¢ is inactive, these 
lines are floated and may be used as inputs to the 
82490XP. MDOE # is not sampled by any clock and 
is a direct connection to the 82490XP momen ouput 
driver. | 


7.33.2 WHEN SAMPLED 


Since MDOE# is a direct connection to the 
82490XP memory output drivers, MDOE# must al- 
ways be driven to a valid level. With MDOE # inac- 
tive, data in the 82490XP’s may be driven to MDATA 
outputs with some propagation delay from MDOE # 
going active. Similarly, there is some float alay from 
MDOE # going inactive. . 


MDOE# must be inactive for the 82490XP to read 
memory data. | 


7.33.3 RELATION TO OTHER SIGNALS 


MDOE# has no relation to MCLK, MOCLK, or 
MOSTB. Since MDOE# controls the final stage of 
the MDATA output buffers; it has no effect on any 
other signal of the 82490XP. 


7.34 MEMLDRV 


Memory Low Capacitance Drivers 


Selects the Low Capacitance Drivers for the 
82495xXP and the 82490XP 
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Inputs to 82495XP and 82490XP (pins Q4, 24) Con- 
figuration Signal 


Synchronous to CLK 


7.34.1 SIGNAL DESCRIPTION 


MEMLDRV is a pin on both the 82495XP and 
82490XP that, when high during reset, select normal 
driving memory output buffers. If this pin is driven 
low at reset, the high capacitance drivers are select- 
ed. Specifically, these are the 82495xXP address out- 
puts to the memory bus, and the 82490XP MDATA 
outputs. The normal output drivers are designed to 
drive up to 50 pF loads. The high capacitance driv- 
‘ers can drive up to 100 pF without derating. 


7.34.2 WHEN SAMPLED 


MEMLDRV is sampled like fiqure 7-1 with a setup 


ow om eee eee 


time of 4 CPU clocks for the "B2495XP and 1 CPU 
clock for the 82490XP. On the 82495XP, MEMLDRV 
becomes the SYNC# input once FSIOUT# goes 
inactive. On the 82490XP, MEMLDRV becomes the 
MFRZ# signal which is sampled ane the first mem- 
ory cycle pealns: 


7.34.3 RELATION TO OTHER SIGNALS 


MEMLDRV shares a pin with SYNC# on the 
82495XP and MFRZ# on the 82490xXP. 


7.35 MEOC# 


Memory End of Cycle | 
Ends a cycle in 82490XP by switching buffers 
Input to 82490XP (pin 23) Cycle Control Signal 


Synchronous to MCLK or Asynchronous (strobed 
mode) 


7.35.1 SIGNAL DESCRIPTIONS 


MEOC # is an input to the 82490XP that ends the 
current cycle and switches memory buffers for new 
cycle. Switching to the next cycle does not cause 
information to be lost in the memory or CPU buffers 
in the 82490XP, but rather switches new buffers to 
the memory !/O bus of the 82490XP. - 


MEOC # is provided so that the memory system, 
which is synchronous to MCLK, can switch to a new 
cycle without synchronization.. In clocked memory 
bus mode MEOC# is sampled with the rising edge 
of MCLK. In strobed memory bus mode the MEOC #4 
function is performed with rising or falling edges of 
MEOC#. 
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For read or write cycles, MEOC# may be activated 
on or after the clock edge of the last MBRDY # of 
the current cycle. If a cycle is pending (pipelining is 
used), the next cycle will flow-through with a propa- 
gation delay from MEOC# assertion. MEOC# is re- 
quired for all memory bus cycles. 


In addition to switching memory buffers, MEOC # 
does three other things. One, MEOC# activation 


_causes the memory burst counter to be reset to its 


start value and if MSEL# is active, MZBT# is sam- 
pled. This allows MSEL# to stay active between cy- 
cles. Two, MEOC# activation during a write cycle 
causes MFRZ# to be sampled for the a subsequent 
allocation (line-fill). Three, MEOC# latches in the 
last slice of data (like MBRDY#) before switching 
buffers. 


7.35.2 WHEN SAMPLED 


In clocked memory bus mode, MEOC# ‘is sampled 
on every MCLK edge. It must always observe setup’ 
and hold times to MCLK. In strobed memory bus 
mode, MEOC# is always sampled and must meet 
proper active/ inactive times. 


7.35.3 RELATION TO OTHER SIGNALS 


MEOC # is provided so that a cycle may end on the 
memory bus before CRDY# can be asserted. The 
implication rules surrounding MEOC# are: 


1. MEOC# < CRDY# 


2. MEOC# for cycle N+1 = 2 clocks after CRDY # 
of cycle N 


3. MEOC# for cycle N+1 > 2 clocks after last 
BRDY # of cycle N 


4. MEOC# > BGT# 


MEOG ¥ active with MSEL # active causes the sam- 
pling of MZBT# and MFRZ#. 


7.36 MFRZ# 


Memory Data Freeze 

Freezes Memory Write Data in 82490XP Buffer 
Input to 82490XP (pin 24) Cycle Control Signal 
Synchronous to MCLK or Strobed 


7.36.1 SIGNAL DESCRIPTION 


MFRZ # is an input to the 82490XP that when active 
causes the 82490XP to “freeze” write data in the 
82490XP memory buifer and allow a subsequent al- 
location to fill a cache line around it. MFRZ# is pro- 
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vided so that an actual write to memory need not be 


done to perform an allocation. Using MFRZ # to per- 
form this dummy write cycle requires that the memo- 
ry bus controller put the allocated line into the “M” 
state. 


PALLC# must be active and MKEN# must be re- 
turned active for the write cycle to be turned into an 
allocation. MFRZ# is sampled when MEOC# goes 
active at the end of the write cycle. The subsequent 


line fill is then filled around the write data to com- 


plete the allocation. 


7.36.2 WHEN SAMPLED 


In clocked memory bus mode, MFRZ# is sampled 
with the MCLK rising edge that MEOC# is sampled 
active for all CPU write cycles. MFRZ# need only 
follow a proper setup and hold time in this situation. 


In strobed mode, MFRZ# is sampled with the falling 
edge of MEOC# for write cycles. MFRZ# need only 
follow a proper setup and hold time in this situation. 


7.36.3 RELATION TO OTHER SIGNALS. _ 


MFRZ# is sampled with the MEOC# going active or 
being active for write cycles. MFRZ# is used so that 


a dummy write cycle can be performed. If an alloca- | 


tion is done, DRCTM# must be asserted during the 
SWEND# window of the line fill to put the allocated 
line in the ‘M”’ state. 


MFRZ# shares a pin with the MEMLDRV configura: 
tion input. 


7.37 MHITW+# 


Memory Bus Hit [M] 

Indicates snoop hit to modified line 

Output from 82495XP (pin H4) Snooping Signal 
‘Sync to CLK 


7.37.1 SIGNAL DESCRIPTION 


The MHITM# output is driven by the 82495XP dur- 
ing a snoop cycle to indicate that the snooping ad- 
dress has hit a Modified line. If the signal is logic 
high, the snoop has not hit a modified line; if the 
signal is logic low, the snoop has hit a modified line. 
When a snoop hits a modified line, the 82495XP au- 
tomatically schedules a write-back of the hit modi- 
fied line to the memory bus. 
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When the device which controls the memory bus 
(the master) performs a memory access, a snoop is 
requested of all other caching devices on the bus 
(snoopers). An asserted MHITM# pin from any of 
the snooper 82495xPs alerts the master that main 
memory’s data is stale, and that the bus must be 
temporarily given to the snooper which has its 
MHITM# asserted so that the modified line can be 
written out to the memory bus. 


7.37.2 WHEN DRIVEN 


The snoop lookup is performed in the clock in which 
SNPCYC# is asserted. The MHITM# result for the 
snoop is driven on the CLK following SNPCYC#, 
and remains valid until the next. assertion of 


SNPSTB#. The MHITM# signal is not valid from } 


SNPSTB# until the CLK after SNPCYC#. 


7.37.3 RELATION TO OTHER SIGNALS 


MHITM# and MTHIT # outputs together indicate the 
results of a snoop lookup in the 82495xXP. 


A 82495XP can accept a snoop request while per- 
forming:memory bus transfers of its own. If a snoop 
is requested of a 82495XP while it is performing a 
data transfer of its own, the results of the snoop may 
be delayed. If SNPSTB# is sampled at a 82495XP 
after it has received BGT # for its own cycle, the 
snoop lookup is performed (SNPCYC# active) after 


the SWEND # of its.own cycle, and MHITM # is driv- 


en with valid results one CLK after SNPCYC# (see 
Sections 6.2. 4 and 6.2. oh 


7.38 MISTB 


Memory Bus Input Strobe 

Strobes data into the 82490XP 

Input to 82490XP (pin 22) Cycle Control Signal 
asynenronous | 


7.38.1 SIGNAL DESCRIPTION 
MISTB is an input to the 82490XP that, on rising or 


falling edges, causes the 82490XP to latch its MDA- 


TA inputs. MISTB is used in strobed memory bus 
mode. In clocked memory bus mode, ee is the 


MBRDY # input. 
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7.38.2 WHEN SAWIPLED 


MISTB is always sampled by the 82490XP. MISTB 
must meet proper strobed mode active and inactive 
times. 


7.38, 3 RELATION TO OTHER SIGNALS - 


MISTB causes the latching of the 82490XP MDATA 
inputs in strobed mode. Mote shares a ee with 
MBRDY#. — | 


7.39 WKEN # © 
| Memory Cache Enable e 
~ Determines 82495XP and CPU cachaabinny 


Input to 82495xXP (pin R1) Cycle Attribute Signal 
eens to CLK 


7.39.1 SIGNAL DESCRIPTION 


MKEN# is an input to the 82495XP that j is sampled 
at the closing of the cacheability window (KWEND# 
is sampled active). The 82495XP drives KEN# back 
to the CPU one clock after sampling the value of 
MKEN #. MKEN# thus determines whether the cur- 
rent cycle is cacheable in the 82495XP and in. the 
vie * 


For read mvclea: if MCACHE# is active (cacheable), | 


KEN # is driven out of the 82495XP to. the CPU to 
indicate cacheability. If MKEN# is sampled inactive 
during KWEND# activation, KEN # is brought inac- 
tive by the 82495xXP, and the line will not be cache- 
able by the CPU or 82495XP. If MCACHE # is inac- 


_ tive, the line will be non-cacheable regardless of | 


MKEN#. PCD active will Cause - MCACHE# to be 
inactive. — 


MKEN # is sampled during write-through cycles that 
are potentially allocatable (PALLC# is active during 
the write cycle). If MKEN# is sampled active during 
KWEND # activation of the write cycle, an allocation 
will occur, and a line-fill will follow the write cycle. 
MKEN # during the line-fill is ignored. The MBC indi- 
cates to the 82495xP that it intends to ) perform: an 
allocation by oe MKEN #. | 


MKEN# must be sampled 1 clock before the first 
BRDY # assertion to make a line-fill non-cacheable 
to the CPU. 


7.39.2 WHEN SAMPLED 


MKEN# is sampled on the clock edge that 
KWEND # is first sampled active. In all other places 
MKEN# may violate setup and hold times. 
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7.39.3 RELATION TO OTHER SIGNALS 


MKEN#: and MRO#. are ‘sampled with KWEND# 
active. MKEN#. must be sampled at least 2 clocks 


before BRDY# assertion to make a line-fill non- 


cacheable. 


7.40 / MOCLK 


Memory Data Output Clock 

Separate Clock Reference for Memory Data Output 
Input to 82490XP (pin 27) 

Asynchronous 


7. 40.1 SIGNAL DESCRIPTION 


MOCLK i is the latch enable for the 82490XP memory 
data outputs (MDATA). MOCLK controls the latching 
of. a transparent latch which, when high, causes 
MDATA to be driven from MCLK. When low, MDATA 
is latched. MOCLK may only be .used in clocked 
memory bus mode and only affects output data. It is 
provided so that a greater MDATA epi hold ape 
can be generated. 


To be used effectively, MOCLK must be a clock in- 


put that is skewed from MCLK. The ‘following picture 


- shows how MOCLK has. increased the hold time of 


the output burst data: 


2 MCLK - 
MOCLK 
MDATA 


MBRDY # 


240956-32 - 


7.40.2 WHEN SAMPLED > 


MOCLK is sampled during and after RESET to de- 
termine whether output data should be driven from 


MCLK or MOCLK. If toggling, MOCLK controls the 
MDATA outputs with MCLK. If high, ‘data is driven 


from MCLK alone. Regardless, input data is never 
referenced to MOCLK. <= 


In strobed memory bus mode the MOCLK ae be- 
comes MOSTB. MOCLK is only used in clocked. 
memory bus mode. 
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7.40.3 RELATION TO OTHER SIGNALS 


To be used effectively, MOCLK must be the same 
frequency as MCLK but be skewed. This effectively 
_ increases MDATA hold time to main memory. Main 
memory must sample the data on MCLK edges. 


MOCLK shares a pin with the MOSTB signal. 


7.41 MOSTB 


Memory Bus Output Strobe 

Strobes data out of 82490XP 

Input to 82490XP (pin 27) Cycle Control Signal 
Asynchronous 


7.41.1 SIGNAL DESCRIPTION 


MOSTB is an input to the 82490XP that, on rising 
and falling edges, causes the 82490XP to output 
data through its MDATA outputs. MOSTB is only 
used in strobed memory bus mode. In clocked mem- 
ory bus mode, MOSTB is the MOCLK input. 


7.41.2 WHEN SANPLED 
MOSTB is always sampled by the 82490XP. MOSTB 
must meet strobed mode active and inactive times. 


7.41.3. REALTION TO OTHER SIGNALS 


MOSTB strobes data out of the 82490XP through 
MDATA. MOSTB shares a pin with MOCLK. 


7.42 MRO 


Memory Read-Only 

Designates current line as read-only 

Input to 82495XP (pin J1) Cycle Attribute Signal 
Synchronous to CLK 


7.42.1 SIGNAL DESCRIPTION 


. MRO # is an input to the 82495xXP that is sampled at 
the closing of the cacheability window (KWEND# 
activation). If sampled active, it causes the current 
line fill to the 82495XP to be put in the read-only 
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state, and causes the line to be non-cacheable to 
the CPU. Writes to read-only lines in the 82495XP 
are treated as write-misses that are non-allocatable 
(PALLC# is inactive). MRO# is a bit in each 
82495xP tag entry. 


Once MRO # is sampled active during KWEND# ac- 
tivation, KEN # to the CPU is driven inactive regard- 
less of the state of MKEN#. MKEN# does, howev- 
er, determine whether the 82495xXP will cache the 
read-only line. Once MRO# is returned active, the 
CPU will only require the number of transfers as indi- 
cated by LEN and CACHE #. If MKEN# is returned 
active, the 82495XP will require an entire cache line. 
82495XP read-only cache lines are filled to the [S] 
state. 


The line-fill portion of an allocation may be filled to 
the read-only state by returning MRO# active during 


KWEND ¥ of the line-fill. MRO# is ‘ignored during 1 


the write portion. 


lf MRO# is returned active during KWEND#, 
DRCTM# and MWB/WT#¥ are ignored during 
SWEND#. 


MRO# must be returned to the 82495xP at least 2 
clocks before BRDY# is returned to the CPU so 
KEN# can be sampled properly. 


There is one Read-Only bit per tag in the 82495xXP. 


7.42.2 WHEN SAMPLED | 
MRO# is sampled on the first clock that KWEND# 


_is sampled active. In all other clocks, MRO # need 


not follow setup and hold times. 


7.42.3 RELATION TO OTHER SIGNALS 


MRO# and MKEN# are sampled with KWEND# 
activation. MRO# must be returned at least 2 clocks 
prior to the first BRDY#. — 


7.43 MSEL# 


Memory Buffer Chip Select 

Selects 82490XP, Causes Sampling of MZBT # 
Input to 82490XP (pin 25) Cycle Control. Signal 
Synchronous to MCLK or Strobed | 
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7.43.1 SIGNAL DESCRIPTION | 


MSEL # is an input to the 82490XP that has 3 main 
functions. One, MSEL# active qualifies the 
MBRDY # input to the 82490XP. If MSEL# is inac- 
tive for a particular 82490XP, MBRDY# will not ee 
recognized by that 82490XP. 


Two, MSEL# going active causes the sampling of | 


MZBT # for the next transfer. - 


Three, MSEL# going inactive resets the 82490XP 
internal memory burst counter. The 82490XP con- 
tains a memory burst counter that counts through 
the CPU burst order with each MBRDY # assertion 
and increments a pointer to the 82490XP memory 
buffer being accessed. 


MSEL# going inactive will reset this burst counter to 
its original burst value. By resetting this counter be- 
fore MEOC# asseriion, all information currently be- 
ing read into the 82490XP is lost, but information 
that is being written out is maintained and may be 


rewritten. 


In general, MSEL# may stay inactive for single 
transfer cycles such as posted 64-bit write cycles. 
Once active, MSEL# need not go inactive as the 


burst counter is reset with MEOC# activation. Since — 


MZBT# may also.be sampled with MEOC#, it is 
possible to leave MSEL# asserted throughout most 
basic transfers. 


MSEL# or MEOC# must be used to reset the burst 
counter before any transfer begins. If transfers are 
interrupted (by a snoop hit before BGT # assertion 
for example), MSEL# must be brought inactive so 
the burst counter may be reset for the snoop write 
back. 


MSEL# must be sampled inactive for at least 1 
MCLK: after reset. This resets the memory burst 
counter for the first transfer. - 


7.43.2 WHEN SAMPLED 


In clocked memory bus mode, MSEL# is sampled 
with all rising edges of. MCLK. In this mode, if 
MSEL# is sampled inactive, the memory burst 
counter is reset and MZBT # is sampled. If MSEL# 
is sampled active and MBRDY # is sampled active, 
the memory burst counter is incremented. Since it is 
constantly sampled with MCLK, MSEL# must al- 
ways be driven to a known state and must always 
meet setup and hold times to every MCLK edge. 
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In strobed mode, MSEL# falling edge causes the 
sampling of MZBT #. While MSEL # is active, MISTB 
and MOSTB cause the memory burst counter to be 
incremented. The rising edge of MSEL# causes the 
memory burst counter to be reset. | 


MSEL# must be inactive sometime after RESET be- 
fore the first transfer to initialize the burst counter. 


7.43.3 RELATION TO OTHER SIGNALS 


MSEL# causes the sampling of MZBT#, and quali- 
fies the use of MBRDY#, MOSTB, and MISTB. 
Since MSEL# acts as a qualifier for these signals, 
MSEL# may be asserted at the same time as 
MBRDY #, MOSTB, or MISTB. 


7.44 MTHIT # 


Memory Bus Tag Hit 

Indicates snoop hit © 

Output from 82495XP (pin G3) Snooping Signal 
Sync to CLK 


7.44.1 SIGNAL DESCRIPTION 


The MTHIT# output is asserted by the 82495XP 
during snoop cycles to indicate that the snoop ad- 
dress has hit a line in the 82495XP cache. An as- 
serted MTHIT# signal from any of the snooping 
82495XP’s alerts a bus master that the data being 
accessed resides in another cache. If SNPINV was 
not asserted on the snoop request, the copy of the 
data in a 82495XP asserting MTHIT# will remain 
valid and in the Shared state—so a caching master 
must also place his copy of the data in the Shared 
state. 


7.44.2 WHEN DRIVEN 


The snoop lookup is performed in the CLK in which 
SNPCYC# is asserted. The MTHIT# result for the 
snoop is driven on the next CLK and remains valid 
until the next assertion of SNPSTB #. The MTHIT# 
signal is not valid from SNPSTB # until the CLK after 
SNPCYC#. 


7.44.3 RELATION TO OTHER SIGNALS 
MTHIT # and MHITM# together indicate the results 


_Of a snoop lookup in the 82495xXP. 


intel. 


An 82495XP can accept a snoop request while per- 
forming memory bus transfers of its own. If a snoop 
is requested while it is performing a transfer of its 
own, the results of the snoop may be delayed. If 
SNPSTB# is sampled at a 82495XP after it has re- 
ceived BGT # for its own cycle, the snoop lookup is 
performed (SNPCYC# active) after the SWEND# of 
its own cycle, and MTHIT# is driven with the valid 
result one CLK after SNPCYC# (see Sections 6.2.4 
and 6.2.5). 


Because an asserted MTHIT# from any snooping 
82495XP requires the master to place the fetched 
line in the Shared state (unless it is an invalidating 
snoop), the memory bus controller should include 
the MTHIT# signals of other processors when gen- 
erating the MWB/WT # signal to its own 82495XP. 


7.45 NIWB/WT # 


Memory Write-back/Write-through - 

Forces lines to be filled to the [S] state 

Input to 82495XP (pin K3) Cycle Attribute Signal 
Synchronous to CLK 


7.45.1 SIGNAL DESCRIPTION © 


MWB/WT # is an input to the 82495XP that is sam- 
pled at the closing of the snoop window (SWEND# 
activation). If sampled active, the current line-fill is 
filled to the [S] state in the 82495XP. The [S] state 
is a write-through state in the 82495XP. : 


MWB/WT# is used in many cases. If a cache to 
cache transfer updates memory and leaves the data 
valid in the other cache, the line must be filled to the 
[S] state instead of the [E] state default. A portion of 
memory may be designated as write-through by as- 
serting MWB/WT # for appropriate addresses. 


MWB/WT# has no effect on the 82495XP if 
DRCTM# is sampled active or MRO# has been 
sampled active during KWEND#. If PWT is active, 
MWB/WT # has no effect and the line is filled to the 
[S] state. 


7.45.2 WHEN SAMPLED 


MWB/WT# is sampled on the first clock edge that 
SWEND# is sampled active. If MWB/WT# is not 
being sampled, it need not follow setup and hold 
times. 


7.45.3 RELATION TO OTHER SIGNALS 


Both MWB/WT# and DRCTM# are sampled with 
SWEND#. 
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MX4/MX8 # 
_ MTR4/MTR8# 


Memory 4/8 I/O bits 
Memory 4/8 Transfers 


Selects MDATA Input/Output width and number of 
memory bus transfers 


Inputs to 82490XP (pins 21, 25) Configuration Sig- 
nals 


Synchronous to CLK 


7.46 


7.46.1 SIGNAL DESCRIPTION 
MX4/MX8# configures the 82490XP to use 


MDATA[0:3] or MDATA[0:7] memory bus !/O pins. aE 


MTR4/MTR8# selects whether the a cache line will 
take 4 or 8 transfers. These selections depend on 


the line ratio (82495xXP line size / CPU line size) and ™ 


must be configured according to the following table: 
Line | MX4/ | MTR4/ | Membus | CPUbus 
Ratio | MX8# | MTR8 # | I/O Pins | I/O Pins 

ee! 


oe Le) 


Pes 
ae ae 


7.46.2 WHEN SAMPLED 


oe ee a 
po | ef 4 | 
a ae 
ae aoe 


These signals are sampled like Figure 7-1 with a set- 
up time of 1 clock. Once the first CADS# is issued 
by the 82495XP these signals are sampled for the 
MZBT# and MSEL# functions. | 


7.46.3 RELATION TO OTHER SIGNALS 


MX4/MX8# shares a pin with MZBT# and MTR4/ 
MTR8# shares a pin with MSEL#. 


7.47 NZBT # 


| Memory Zero Base Transfer 


Forces cycles to begin at subline address 0 
Input to 82490XP (pin 21) Cycle Control Signal 
Synchronous to MCLK or Strobed 
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7.47.1 SIGNAL DESCRIPTION 


‘MZBT# is an input to the 82490XP that forces a 
read or write cycle to begin with burst address 0 
regardless of the CPU generated address. 


MZBT# is sampled before the transfer begins. 
MZBT# is sampled with MSEL# and MEOC#. 
MZBT # is sampled with MSEL# going active for the 
current cycle. If MSEL# stays active between cy- 
cles, MZBT # is sampled with MEOC# going active 
for the previous cycle. 


Once sampled, data input to the 82490XP’s will start 
at burst address 0 and continue through 4, 8, C, etc. 


If the CPU is requesting a burst location other than. 


0, the memory bus controller must hold off any 


. BRDY# until that bursted item is read from the 


memory bus. 


7.47.2 WHEN SAMPLED 


' In clocked mode, MZBT# is sampled in two loca- 
tions. First, MZBT# is sampled on all MCLK rising 
edges where MSEL# is sampled inactive. Once 
MSEL# is sampled active, the value of MZBT # that 
was sampled one ue pelole is used for the next 
transfer. 


Second, MZBT# is sampled on MCLK rising edges 
where MEOC # is sampled active with MSEL# ac- 
tive. The MZBT # value sampled will be used for the 
next transfer. This allows MSEL# to stay asserted 
between transfers if so desired. 


In strobed mode, MZBT # is sampled with the same 


two signals. First, it is sampled with the falling edge | 


of MSEL#. Second, it is sampled with the falling 
edge of MEOC# if MSEL# is active. | | 


In clocked memory bus mode MZBT# must follow 
setup and hold times to all MCLK edges where 
MSEL # is sampled inactive or MEOC # is sampled 
active with MSEL¥# active. 


In strobed memory bus mode MZBT# must meet 
setup and hold times to MSEL# falling edge and 
MEOC # falling edge if MSEL# is active. 
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7.47. 3 RELATION TO OTHER SIGNALS » 


MZBT# is sampled with MSEL# and MEOC# ana 
has no effect otherwise. In systems that will never 
force a zero-based Haneles MZBT # may be driven 
high after RESET. 


MZBT# shares a pin with the MX4/MX8 # configu- 
ration input. 


7.48 NCPFLD# 


-Non-Cacheable PFLD 
_ Enables Non- Cacheable Floating Point Loads 


Input to 82495XP (N4) Configuration Signal 
Asychronous | 


7.48.1 SIGNAL DESCRIPTION 


During RESET, this pin functions as the NCPLFD# 

configuration signal. The 82495XP can be config- 

ured to decode i860 XP CPU PFLD (Pipelined Float- 

ing Point Load) cycles. The 82495XP supports 3 op- _ 

erational modes for PFLD cycle decoding as defined — 

by FPFLDEN and NCPFLD#: 

Mode #1. PFLD cycles that are cached in the 
82495xXP. 

Mode #2. PFLD cycles not cached in the 82495XP, 
without an external PFLD extension 
FIFO. . 

Mode #3. PFLD cycles not cached in the B2495XP, 
with an external PFLD extension FIFO. 


FPFLDEN | NCPFLD# 
Sats Me De 
i a a a OO 


See Section 5.2.5 for details. 
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7.48.2 CASES IT IS ASSERTED AND 
DEASSERTED 


NCPFLD# is sampled on the falling edge of RESET 
and is a don’t care at any other time. NCPFLD# 
must be valid for at least 10 CLK’s before RESET’s 
falling edge. 


7.48.3 RELATION TO OTHER SIGNALS 


NCPFLD# shares a pin with FLUSH#. Both 
NCPFLD# and FPFLDEN describe the PFLD mode 
used. 


7.49 NENE# 


Next Near 


Indicates current cycle address is near previous one. . 


Output from 82495XP (pin D5) Cycle Control Signal 
Synchronous to CLK 


7.49.1 SIGNAL DESCRIPTION 


NENE # indicates to the MBC that the address of 
the requested memory cycle is ‘“‘near’ the address 
of the previously generated one (in the same 2K 


DRAM page). This information may be used by the 


MBC to optimize access to paged or static column 
DRAMs. 


7.49.2 WHEN DRIVEN 


NENE # is valid together with CADS# and will stay 
valid until CNA#.or CRDY #. 


7.49.3 RELATION TO OTHER SIGNALS 


Address and cycle specification signals (MSETO- 
MSET10, MTAGO-MTAG11, MCFAO—MCFA6, 
CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE#, 
NENE#, SMLN#, KLOCK#, and ele) ww be 
valid with CADS#. 


NENE# may change state after CNA# or CRDY# 
are asserted to the 82495xXP. : 


7.50 PALLC# 


Potential Allocate 

Indicates 82495XP intent to allocate current cycle 
Output from 82495xXP (pin D2) Cycle Control Signal 
Synchronous to CLK | 
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7.50.1 SIGNAL DESCRIPTION 


PALLC # indicates to the MBC that the current write 
cycle may allocate (perform a line-fill on) a cache 
line. The MBC chooses to perform an allocation by 
asserting MKEN# during KWEND# of the write cy- 
cle. Potential allocate cycles are cycles which are 
82495XP misses with PCD and PWT inactive.. 


The exact condition for assertion of PALLC # is: 
Miss * PCD * IPWT * LOCK# * W/R# * D/C# * M/IO# 


PALLC# is inactive (HIGH) for any write-hit to a 
Read-Only line. 


7.50.2 WHEN DRIVEN 


PALLC# is valid in the same CLK as CADS# andis Mim 


valid until CRDY # or CNA#. 


7.50.3 RELATION TO OTHER Snare 


_PALLC# is valid with CADS#. 


7.51 PAR# 


Parity Selection 


Selects 82490XP as a Parity Device 
Input to 82490XP (pin 32) Configuration mee 
eynenrenoys to CLK 


7.51.1 SIGNAL DESCRIPTION 


PAR# is a strapping option on the 82490XP that, 
when strapped low, configures that 82490XP device 
to be a dedicated parity device. A 82490XP parity 
device must be configured the same as all the other 
devices, however, the data lines are defined differ- 
ently. CDATA[0:3] are 4 parity bit 1/O lines and 
CDATA[4:7] are 4 bit select lines so each parity line 
may be written individually. Parity devices must be 
used as follows: 


82490XP 
1/O Bits | 
(CPU:Mem) 


of Parity 
Devices 
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7.51.2 WHEN SAMPLED 


PAR# isa strapping option and must be tied either 
high or low. 


7.51.3 RELATION TO OTHER SIGNALS | 


PAR# affects the definition of the CDATA and MDA- 
TA lines of the 82490XP. 


7.52 RDYSRC 


Ready Source 

Cycle control signal to the MBC 

Output from 82495xXP (pin C1) Cycle Control Signal 
Synchronous to CLK | 


RDYSRC serves as a cycle control signal to the 
MBC. It indicates the source of the BRDY # genera- 
tion (either 82495XP or MBC) for the CPU. When 
high it indicates that the MBC should generate the 
BRDY #s to the CPU, when low it indicates that. the 
82495xXP will provide the BRDY #s. 


RDYSRC is asserted for line-fill and not asserted for 
the write portion of allocation cycles. 


7.52.2 WHEN DRIVEN 


RDYSRC is valid in the same CLK as CADS # ang is 
valid until CRDY # or CNA#. 


7.52.3 RELATION TO OTHER SIGNALS 


Address and cycle specification signals (MSETO- 
MSET10, MTAGO-MTAG11, MCFAO-—MCFA6, 
CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE #, 
NENE #, SMLN#, KLOCK#, and a ly be 
valid with CADS#. 


7.53 RESET 


Reset — 


Forces the 82495XP to begin execution in a known 
state 


Input to 82495XP (Q5) 
Asynchronous 
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7.53.1 SIGNAL DESCRIPTION 


The falling edge of this signal tells the 82495XP to 
sample all configuration inputs and initializes: the 
82495xP to a known state. See the specific configu- 
ration signals for setup and hold times relative to 
RESET’s falling edge. RESET can be aurea at 
any time. 


During initlialization, the 82495XP LRU bits are set 
to 1 indicating that the 82495XP LRU way is way 1. 
The 82490XP MRU bits are initlialized to 0 as are all 


tag array bits. 


RESET takes about 4100 clocks in the 82495XP. 


RESET with self-test takes about 80,000 clocks. 


7.53.2 WHEN SAMPLED 


RESET is an asynchronous input. RESET must have 
a pulse width of at least 8 CLK’s in order to guaran- 
tee 82495XP recognition. 


7.53.3 RELATION TO OTHER SIGNALS 


The following signals are sampled at RESET: 
CNA # [CFGO]: CFGO line of 82495XP , 
| 3 configuration inputs — 
SWEND # [CFG1]: CFG1 line of 82495XP : 
configuration inputs 
KWEND # [CFG2]: CFG2 line of 82495XP 3 
| configuration inputs | 


FLUSH# [NCPFLD#]: | If low, enables decoding of 


i860XL non- cacheable PFLD - 
FPFLD # [FPFLDEN]: 


mode. 
BGT # [C490LDRvI: 


If high, enables the external — 
FIFO for i860XL PFLD mode. — 
Indicates the driving strength of 


| the 82495XP/82490XP 
interface. 


SYNC # [MEMLDRV]: Indicates the memory bus 
driving strength. 

SNPCLK#-[SNPMD]: | Indicates the snooping mode; 
synchronous or strobed. 


CFG2-CFG0 Configure cache parameters 
such as lines/sector, line ratio, 
and number of tags. 
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7.54 SLFTST# 


Self Test 

Executes 82495XP self-test 

Input to 82495XP (pin M2) Test Signal 
Synchronous to CLK 


7.54.1 SIGNAL DESCRIPTION 


lf SLFTST# is sampled low and HIGHZ# is sam-. 


pled high, the 82495xXP will perform a self-test after 
reset. The results of the self-tests are given by CA- 
HOLD when FSIOUT # goes inactive. 


7.54.2 WHEN SAMPLED 


SLFTST # is sampled with reset like figure 7-1 with a 
setup time of 10 CPU clocks. SLFTST# ‘is then a 
“don’t care” until after the first CADS# activation 
when it becomes the CRDY # pin. 


7.54.3 RELATION TO OTHER SIGNALS 


SLFTST# shares a pin with CRDY #. The 82495XP 
enters self-test if both SLFTST# is eaniee active 
and HIGHZ# is sampled inactive. 


7.55 SNLN# 


Same Line oo 

Current cycle is same 82495XP line as previous one. 
Output from 82495XP (pin C6) Cycle Control Signal 
Synchronous to CLK | 


7.55.1 SIGNAL DESCRIPTION 


SMLN # is used to indicate to the MBC that the cur- 
rent cycle is accessing the same 82495XP cache 
line as the previous cycle. This indication can be 
used by the MBC to selectively activate its 
SNPSTB# signal to other caches in the system. For 
example, back-to-back snoop hits to the same line 
may be snooped only once. | 


7.55.2 WHEN DRIVEN 


SMLN # is asserted with CADS# and will Stay valid 
~until CNA# or CRDY#. 
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7.55.3 RELATION TO OTHER SIGNALS 


Address and cycle specification signals (MSETO- 
MSET10, MTAGO-MTAG11, MCFAO-—MCFA6, 
CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE#, 
NENE#, SMLN#, KLOCK#, and CPLOCK #) will be 
valid with CADS #. 


7.56 SNPADS# 


Cache Snoop Address Strobe 

Initiates a snoop write back cycle 

Output from 82495XP (pin F3) Snooping Signal 
Sync to CLK | 


7.56.1 SIGNAL DESCRIPTION 


The SNPADS# signal indicates valid cache control 
and attribute signals, functioning identically to 
CADS#, but is generated only on snoop write- 
backs. The separation of address status signals for 
normal and snoop write-back cycles eases memory 
bus controller implementation. When SNPADS # is 
activated, the memory bus controller should abort all 
pending cycles for which BGT# has not been is- 
sued. The 82495XP reissues these non-committed 
cycles after the snoop write-back has completed. 


7.56.2 WHEN DRIVEN 


SNPADS # is produced when a snoop hits a modi- 
fied line. A modified line condition exists when a line 
in the cache has been updated, and copies of that 
memory location in other devices are no longer val- 
id. A snoop is initiated by the master of a shared bus 
when accessing a emery location on the shared 
bus. 


The ones a the 82495XP to a snoop appears 
on the MTHIT # and MHITM# pins in the clock after 
SNPCYC# is active. If these pins are both driven 


low, the snoop resulted in a hit to a modified line, 


and a snoop write-back is initiated with the assertion 
of SNPADS#. SNPADS# is driven, at earliest, two: 
clocks after SNPCYC#. Like CADS#, SNPADS # is 
active for one CLK, and is always valid. | 


7.56.3 RELATION TO OTHER SIGNALS 


Cycles initiated by SNPADS# require only CRDY #; 
they do not require the other cycle progress signals 
aad KWEND#, SWEND #). 
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The SNPADS # signal is driven by the 82495XP to 
indicate the start of the write-back cycle; the 
82495XP drives the following address and cycle 
specification signals valid with SNPADS#: CW/R#, 
CD/C#, CM/IO#, MCACHE#, RDYSRC, NENE#, 
SMLN#, and the address on MSET[0:10], 
MTAG[0:11], and MCFA[0:6]. Upon assertion of 
SNPADS #, the memory bus controller should can- 
cel all pending cycles for which BGT # has not yet 
‘been asserted, because they will be reissued after 
the snoop write-back. The 82495XP will ignore 
BGT# while SNPBSY # and MHITM# are active (ie, 
during the write-back). 


The 82495XP can accept a snoop request while per- 
forming memory bus transfers of its own. If a snoop 
is requested while it is performing a transfer of its 
own, the results of the snoop and any necessary 
snoop write-backs may be delayed. If SNPSTB# is 
sampled at a 82495xP after it has received BGT# 


Saws its Aran mwAlA ame thn ne Ann 


wet 
the snoop write- back will occur after CRDY # for the 
82495XP’s own cycle. See Sections 6.2.4 and 6.2.5 
for Getalls: 


7.57 SNPBSY # 


Snoop Busy 

Indicates additional snoop processing in progress 
Output from 82495XP a) F1) Snooping Signal 
Sync to os 


7.57.1 SIGNAL DESCRIPTION 


SNPBSY # and SNPCYC # indicate .a snoop in prog- 
ress. The SNPCYC# signal is asserted on the actual 
snoop look-up to the 82495XP tags. If the snoop 
look-up indicates a valid line is hit and the snoop is 
invalidating, the 82495XP must perform a back inval- 


idation on the CPU. If a snoop hit occurs to a modi-. 


fied line, a snoop write-back must occur. SNPBSY # 
is asserted and remains active while either a back 
invalidation or a snoop write-back is in progress. 


7.57.2 WHEN DRIVEN © 


SNPBSY # ‘is activated for two conditions. First, 
SNPBSY # is activated whenever a back invalidation 
is necessary: the snoop returns MTHIT # active and 
SNPINV was asserted on the snoop initiation. Sec- 
ond, SNPBSY # is activated when a modified cache 
line is hit on a snoop, as indicated by MHITM#, until 
the modified line has been written back (CRDY # re- 
turned for the write-back). 


_SNPBSY # is valid in the CLK following SNPCYC#, 
and if active, remains active for a minimum of two 
CLKS. 
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7.57.3 RELATION TO OTHER SIGNALS | 


After SNPCYC# occurs for a snoop, a new snoop 
_ may be initiated. If SNPBSY# is asserted for the 


initial snoop, the SNPCYC# of the second snoop is | 
delayed until the SNPBSY # signal is deasserted for 

the initial snoop, indicating that its snoop processing 

has completed. 


7.58 SNPCLK [SNPMD] 


Snoop Clock [Snooping Mode] 

Selects 82495XP snooping mode. 

Input to 82495XP (pin S3) Snooping Signal 
Synchronous to CLK 


7.58.1 SIGNAL DESCRIPTION 


SNPMD selects whether the 82495XP snoop initia- 
tion be in synchronous, clocked, or strobed mode. 

82495XP snoop response is always synchronous to 
CLK. 


Synchronous mode (to CLK) is selected by SNPMD 
sampled low during reset. Strobed mode is selcted 
by SNPMD sampled high. during reset. Clocked 
mode is selected by connecting the snoop clock 
source to SNPMD, and thus SNPMD Pecomes the 
actual snoop clock (SNPCLK). 


7.58.2 WHEN SAMPLED © 


SNPMD is sampled like figure 7-1 with a setup time 
of 4 CPU clocks. SNPMD is then not used unless 
clocked mode is being selected. If clocked mode is 
selected, SNPMD becomes SNPCLK to clock in 
snoop requests. 


7.58.3 RELATION TO OTHER SIGNALS 


SNPMD becomes SNPCLK if a clock signal is de- 


tected at reset. In this clocked mode, SNPCLK is 
then used to clock-in SNPSTB#, the snoop ad- 
dress, and all snoop attributes. 


7.59 SNPCYC# 


Snoop Cycle 

Indicates snoop look-up occurring in 82495XP tags 
Output from 82495XP (pin H3) Snooping Signal 
Sync to CLK 
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7.59.1 SIGNAL DESCRIPTION 


SNPCYC# is asserted by the 82495XP during the 
clock when the actual! tag look-up for the snoop is 
performed. SNPCYC# may appear as early as the 
CLK following SNPSTB# assertion, or may be de- 
layed several clocks while a snoop write-back or 
82495XP memory bus cycle take place. 


7.59.2 WHEN DRIVEN 


SNPCYC# is always a valid 82495XP output. It is 
asserted once, for a single clock, for every snoop 
which is initiated in the 82495xXP. 


7.59.3 RELATION TO OTHER SIGNALS 


A snoop is initiated by assertion of the SNPSTB# 
input if MAOE# is not asserted. The actual snoop, 
signalled by the assertion of SNPCYC#, can be de- 
layed by a prior snoop’s write-back in progress 


(SNPBSY # asserted) or by a 82495XP memory cy- . 
cle in progress (SNPSTB# occurs after BGT #)— 
_ see SNPSTB# for details. If neither of these is oc- 


curring, strobed and clocked snooping modes can 
also delay snoop look-up for a clock while the snoop 
address and attributes are synchronized. 


In the clock following SNPCYC#, MHITM#. and 
MTHIT # report.valid snoop results. 


7.60 SNPINV 


Snoop Invalidation 

Forces invalidation of snoop hits 

Input to 82495XP (pin P5) Snooping Signal 
Sampled with SNPSTB# (see SNPSTB#) 


7.60.1 SIGNAL DESCRIPTION 


Assertion of the SNPINV signal during the initiation 
of a snoop request forces a snoop hit for that re- 
quest into the Invalid state. | 


The SNPINV pin is sampled upon initiation of a 
snoop request with SNPSTB# activation, depending 
on snooping mode: rising edge of first CLK when 
SNPSTB is asserted (synchronous snooping mode), 
or rising edge of first SNPCLK when SNPSTB# is 
asserted (clocked mode), or falling edge of strobed 
SNPSTB # (strobed mode). 
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7.60.2 WHEN SAMPLED 


When a bus master performs a bus access, the 
SNPSTB# of all other 82495XPs is asserted to initi- 
ate a snoop for that address. If the master’s access 
is one which is modifying the data (a write to memo- 
ry, etc.), the SNPINV pin of all snooping 82495XPs 
must be asserted during SNPSTB# so that the line 
is properly marked Invalid. 


SNPINV is not asserted during SNPSTB# assertion 
if snoop hits are to remain valid: the master issuing 
the snoop. does not require: their uvetiaavion (a 
read). 


SNPINV assertion forces all snoop hits to be invali- 
dated, overriding other inputs or attributes (ie 
SNPNCA). When SNPINV is not asserted, cache 
states change according to normal protocol. 


SNPINV is only sampled with SNPSTB #, which may 


be qualified by CLK or SNPCLK depending on the 
snooping mode, and must meet’ setup and hold 
times for the edge of its sampling. Wnen SNPSTB# 
is not being asserted, SNPINV is a don't care and 
need not follow setup. ang hold times. 


7.60.3 RELATION TO OTHER SIGNALS 


SNPINV is sampled according to SNPSTB#, which 
may be qualified by SNPCLK or CLK, depending on 
the snooping mode. SNPINV overrides the SNPNCA 
input, which may also be asserted with SNPSTB #. If 
MAOE# is active with SNPSTB# sampling, the 
snoop request is ignored. 


7.61 SNPNCA 


Snoop Non Caching device Access 


Indicates to snooping 82495XP that the initiating 
master is a non- caching device 


Input to 82495XP (pin Q3) Snooping Signal 
Sampled with SNPSTB# (see SNPSTB#) 


7.61.1 SIGNAL DESCRIPTION 


SNPNCA indicates that the master which is initiating 
the snoop request will not cache the data. If the 
SNPNCA pin is ‘not asserted and the snoop is nonin- 
validating (where noninvalidating = SNPINV not as- 
serted), a snoop hit: line must be placed in the 
Shared state, since the data will exist in another 
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cache. If SNPNCA is asserted and the snoop is non- 
invalidating, a snoop hit line will not be entered into a 


new cache, so a hit Exclusive or Modified line will be 


placed in the Exclusive state by the 82495xXP. A 
noninvalidating snoop hit to a Shared line must keep 
the hit line in the Shared State, regardless of 
SNPNCA. , 


SNPNCA is sampled upon initiation of a snoop re-: 


quest with SNPSTB# activation, depending on the 


snooping mode: rising edge of first CLK when 


SNPSTB# asserted (synchronous snooping mode), 
or the rising edge of SNPCLK when SNPSTB# is 
asserted (clocked snooping mode), or the falling 
edge of SNPSTB# (strobed snooping mode). 


7.61.2. WHEN SAMPLED 


To achieve maximum processor performance and 
minimum bus iraific, SNPNCA snouid be asseried 
when the noninvalidating snoop is caused by an ac- 


cess from a non-caching device like a DMA. 


If the snoop is being caused by a device which will | 


also be caching the data, SNPNCA must not be as- 
serted, so that the 82495XP does not leave the hit 
line in an Exclusive state—subsequent writes to 
lines in this state do not appear on the bus, and stale 
data would result in the cache which incorrectly as- 
serted SNPNCA. ° 


If SNPNCA is asserted on a noninvalidating snoop 
request, the following outlines the behavior of the 
cache for a snoop hit in each of the MESI states: 


Modified The data is written tothe bus, and the 
line is placed in the Exclusive state 


Exclusive The line remains in the Exclusive state 
Shared The line remains in the Shared state 


Invalid 
Invalid. 


If SNPNCA is NOT asserted on a sominvendaling 
snoop request, an M, E, or S state hit line will be 
placed in the Shared state. Again, M state causes a 
write to the bus, Invalid lines remain Invalid. 


SNPNCA is only sampled with SNPSTB#, which 
may be qualified by CLK or SNPCLK depending on 
the snooping mode, and must meet setup and hold 
times for the edge of this sampling. When 
SNPSTB # is not being sampled, SNPNCA is a don't 
care and need not follow set-up and hold times. — 


7.61.3 RELATION TO OTHER SIGNALS 


SNPNCA is sampled with SNPSTB#, which may be 
qualified by SNPCLK or CLK, depending on snoop- 
ing mode. The. assertion of SNPINV overrides 


This is a cache miss. The line’ remains 
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SNPNCA, and places all snoop hit lines into the In- 
valid state. If MAOE# is active on SNPSTB# sam- 
pling, the snoop request is ignored. 


7.62 SNPSTB# 


Snoop Strobe 


Initiates 82495XP snoop and latches anor address 
& attributes 


Input to 82495XP (pin R3) Snooping Signal 
Sync to CLK or SNPCLK, or strobed 


7.62.1 SIGNAL DESCRIPTION 


Snoop strobe initiates a 82495XP snoop request. It 
controls the latching of the snoop address and 
snoop attribute signals, in the manner specified by 


al... 


one of inree snooping modes: 


Snooping Modes 


- Snoop Address/ - | 
Attributes Sampled on: 
Strobed | falling edge of SNPSTB# 


Clocked rising edge of SNPCLK when 
° SNPSTB# sampled active 


rising edge of CLK when 
SNPSTB# sampled 


- Synchronous 


SNPSTB# must be asserted to initiate a snoop re- 
quest. Snoops are initiated by a bus master for all 
memory accesses, to ensure that data residing in 
other caches is flushed if modified and invalidated if 
necessary. 


SNPSTB# must be deasserted for at least one 
SNPCLK or CLK when clocked or synchronous 
snooping mode (respectively) is used, in order to 


rearm for the next snoop. 


SNPSTB # can be asserted while a snoop is in prog- 
ress, allowing one level of pipelining. However, the 
reassertion of SNPSTB# while snooping is in prog- 
ress must not occur until after SNPCYC #—precise- 
ly, after the falling edge of SNPCYC# for strobed 
and clocked modes, or in the clock after SNPCYC # 
is active for synchronous mode. SNPSTB # must not 
be asserted between the first and last BGT# ofa 
locked sequence. Similarly, SNPSTB # must not oc- 
cur after the BGT # of the write through and before 
the BGT # of the allocation when a Read-for-Owner- 
ship transaction is occurring. 


SNPSTB # itself does not affect the cache contents 
or states, but the snoop signals SNPINV and 
SNPNCA, latched upon SNPSTB#, force various 
changes in the cache on a snoop hit. 
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7.62.2 WHEN SAMPLED 


SNPSTB# is sampled on every SNPCLK or CLK in 
clocked or synchronous modes, and is sampled con- 
stantly in strobed mode. While a snoop is in prog- 
ress, anew SNPSTB# Is recognized as a new, pos- 
sibly pipelined, snoop request. After the assertion of 
a pipelined SNPSTB#, the SNPSTB# signal must 
not be reasserted until after the next SNPCYC#. 


SNPSTB# should always meet proper set-up and 
hold times when operating in clocked or synchro- 
nous modes. When operating in strobed mode, it 
must meet minimum active/inactive times to be 
properly recognized in the next clock. 


7.62.3 RELATION TO OTHER SIGNALS 


SNPSTB# latches the following signals: SNPINV, 
SNPNCA, MBAOE#, and MAOE#, and the address 
on the MSET, MTAG, and MCFA pins. The address 
which appears on the MSET, MTAG, and MCFA ad- 
dress pins is to be snooped in the 82495XP. 
MAOE # acts as a qualifier for a snoop; if MAOE # is 
active when sampled on a SNPSTB# assertion, the 
snoop request is ignored. SNPINV and SNPNCA 
provide the 82495XP with snoop attributes which af- 
fect the state of a snoop hit cache entry. 


lf MBAOE# is active during SNPSTB# assertion, 
the 82495XP forces all bits:in the subline address 
(those address bits which MBAOE # controls) to 0 
on a snoop write back for that snoop. 


Snoops and memory accesses are interlocked, such 
that after BGT # for a memory access has been is- 
sued, a SNPSTB# which is asserted will be latched, 
with its address and attributes, but will not cause a 
snoop until after SWEND# for that memory cycle. 
After BGT# has been issued for a cycle, snoop 
write-backs are delayed until after the CRDY# for 
that cycle. Likewise, once a snoop is underway 
(SNPCYC# active) BGT# is ignored until anor 
completion. 


SNPSTB# must not be deasserted and reasserted 
(specifically, cause a second falling edge) between 
its initial recognition and SNPCYC #—ie, SNPSTB # 
must not be asserted before the SNPCYC# of the 
previous SNPSTB #. In strobed and clocked modes, 
SNPSTB# can be reasserted after the falling edge 
of SNPCYC#; in synchronous mode, SNPSTB # can 
be reasserted in the CLK after SNPCYC# is active. 
This second assertion of SNPSTB#, after 
SNPCYC#, can occur while the first snoop is still 
progressing (SNPBSY # is active), allowing one level 
of snoop pipelining. In this case, a third assertion of 
SNPSTB# must not occur until after the SNPCYC# 
for the second, piped snoop request. 
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SNPSTB# must not be asserted while the 82495XP 
is executing a locked sequence (LOCK# active). 
Specifically, SNPSTB# must not be asserted after 
the BGT # for the first locked access and before the 
BGT # of the last locked access. 


Systems which support Read-for-Ownership must 
not assert SNPSTB# between the BGT# of the 
write through and the BGT # of the allocation during 
a Read-for-Ownership operation. 


7.63 SWEND# 


Snoop Window End 

Closes Snooping Window 

Input to 82495xXP (pin Q1) Cycle Progress Signal 
Synchronous to CLK 


7.63.1 SIGNAL DESCRIPTION 


SWEND # is an input to the 82495XP that, when 
asserted, closes the snooping window and causes 
sampling of MWB/WT# and DRCTM#. Once 
snooping of all other 82495XP’s is complete, 
DRCTM# and MWB/WT# can be determined. 


Snoop response is blocked by the 82495XP be- 
tween BGT# and SWEND# activation. Therefore, 
the faster SWEND # is closed: faster engepsk can be 
determined. 


All CPU-generated write cycles and cache read miss 
cycles must cause a snoop on the memory bus. 
SWEND# may be activated once snooping has 
completed for these cycles. SWEND# activation 
causes the 82495XP’s internal tags to change state 
for the current cycle (if necessary). DRCTM# and 
MWB/WT # influence the state change decision. 


SWEND# need only be activated for those cycles 
which require the sampling of DRCTM# and 
MWB/WT #. 


If a cycle does not specifically require SWEND#, 
and SWEND # is not returned, snooping is blocked 
from BGT# to CRDY#. For this reason, it may be 
more efficient to always return SWEND#. : 


7.63.2. WHEN SAMPLED 


SWEND# is sampled by the 82495XP on the clock 
or after KWEND# is sampled active for those cycles 


_ that sample KWEND#. For cycles that do not sam- 
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ple KWEND#, SWEND# is sampled with or after 
BGT#. Once SWEND# is sampled active, it is-ig- 
nored until KWEND # of the next cycle. If SWEND# 
is not being sampled, it may violate ule and hold 
times. 


Snoop response is blocked between BGT# and 
SWEND#. If a snoop is initiated between BGT# 
and SWEND#, the MTHIT# and MHITM# re- 
sponse is given after SWEND# activation. Any sub- 
sequent snoop write back would begin after 
CRDY #. 


7.63.3 RELATION TO OTHER SIGNALS | 


SWEND# causes the sampling of MWB/WT# and 
DRCTM#. SWEND# is sampled once KWEND # is 
sampled active. BGT#, KWEND#, and SWEND # 
may be asserted in the same clock. 


- SWEND# shares a pin with CFG1. 


7.64 SYNC# | 
Sync 


Snorer 82495XP. TAG array with Main Memo- 


ry 


Input to '82495XP (04) Cache Synchronization Sig- 
nal 


Asynchronous 


7.64.1 SIGNAL DESCRIPTION | - 


SYNC # activation will cause the synchronization of 
the 82495xXP and i860 XP CPU tag arrays with main 
memory. The 82495xXP will flush all modified entries 
to memory. All valid tag entries will be kept, with 
modified [M] state lines pbecoring non-modified [E] 
state lines, . 


7.64.2 WHEN SAMPLED 


_SYNC# can be asserted at any time. The 82495XP 
will complete all outstanding cycles on the CPU and 
memory bus before beginning the SYNC process. 
The memory bus controller does not have to prevent 
SYNC # during locked cycles because the 82495XP 
will complete its locked cycle before the SYNC pro- 
cess will begin. 


Once a SYNC operation has begun, the SYNC# sig- 


nal is ignored until the operation completes. If 
RESET or FLUSH # is asserted while the SYNC op- 
eration is in progress, the SYNC operation will be 
‘aborted and the RESET or FLUSH famedialely exe- 
cuted. 
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SYNC# is an asynchronous input. SYNC # must 
have a pulse width of 2 CLK’s i in order to guarenice 
82495XP recognition. 


7.64.3 RELATION TO OTHER SIGNALS 


To initiate a SYNC, the 82495XP will complete all 
pending cycles and prohibit further ADS #’s to occur 
while a SYNC is in progress. The FSIOUT# output 
signal is used to indicate the start and end of the 
SYNC operation. It will become active when the 
SYNC# signal is internally recognized (all outstand- 
ing cycles have completed) and will de-activate 
when the SYNC operation has completed. 


‘The memory bus controller supplies BRDY # to the 


CPU once the SYNC has completed. Once SYNC 
has begun, and FSIOUT# active, all CADS#’s and 
CRDY #’ s correspond to the write-backs caused by 


am ONJAID want 
ine SYNC operation. 


The 82495XP can be snooped during SYNC cycles 
and the snooping protocols will be the same as that 
for any memory bus cycle. 


7.65 TCK 


~ Test Clock 


Clock for the JTAG boundary scan tests ~ 
Input to the i860 XP CPU (pin Q1) Test Signal 
Input to the 82495xP (pin P3) 

Input to the 82490XP (pin 3) 

Synchronous | | 


7.65.1 SIGNAL DESCRIPTION 


TCK is an input to the i860 XP CPU, 82495XP and 
82490XP and provides the clocking function ‘re- 
quired by the JTAG boundary scan feature. TCK is 
used to clock state information and data into and out 
of the component. State select information and data 
are clocked into the component on the rising edge 
of TCK on TMS and TDI, respectively. Data is 
clocked out of the part on the falling edge of TCK on 
TDO. 


In addition to using TCK as a free running clock, it | 
may be stopped in a low, logic 0, state, indefinitely 
as described in IEEE 1149.1. While TCK is stopped 
in the low state, the boundary scan latches retain 
their state. 


When boundary scan is not used, TCK should be 
tied low. | | 
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7.65.2 WHEN SAMPLED 


TCK is a clock signal and is used as a reference for 
sampling other JTAG signals. 


7.65.3 RELATION TO OTHER SIGNALS 


On the rising edge of TCK, TMS and TDI are sam- 
pled. .On the falling edge of TCK, RDO is driven. 


7.66 TDI 


Test Data Input 

Receives serial test instructions and data 
Input to the i860 XP CPU (pin S14) Test Signal 
Input to the 82495XP (pin N3) 

Input to the 82490XP (pin 2) 

Synchronous to TCK 


7.66.1 SIGNAL DESCRIPTION 


TDI is the serial input used to shift JTAG instructions 
and data into the component. The shifting of instruc- 
tions and data occurs during the SHIFT-IR and 
_ SHIFT- DR TAP controller states, respectively. 
These states are selected using the TMS signal as 
described in chapter 9. 


An internal pull up resistor is provided on TDI to en- 
sure a known logic state if an open circuit occurs on 
the TDI path. Note than when “1” is continuously 
shifted into the instruction register, the BYPASS in- 
struction is selected. 


7.66.2 WHEN SAMPLED 


TDI is sampled on the rising edge of TCK, during the 
SHIFT-IR and the SHIFT-DR states. During all other 
TAP controller states, TDI is a “‘don’t care”. 


7.66.3 RELATION TO OTHER SIGNALS | 


TDI is only sampled when TMS and TCK have been 
used to select the SHIFT-IR or SHIFT-DR states in 
the TAP controller. 


For proper initialization of JTAG logic, TDI should be 
driven high, “1”, for at least four TCK cycles follow- 
ing the rising edge of RESET. 
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7.67 TDO 


Test Data Output 
Outputs serial test instructions and data 


Output from the i860 XP CPU (pin R10) Test Signal 


Output from the 82495XP (pin C4) 
Output from the 82490XP (pin 84) 
Synchronous to TCK 


7.67.1 SIGNAL DESCRIPTION 


TDO is the serial output used to shift JTAG instruc- 
tions and data out of the component. The shifting of 
instructions and data occurs during the SHIFT-IR 
and SHIFT- DR TAP controller states, respectively. 
These states are selected using the TMS signal as 
described in chapter 9. | 


When not in SHIFT-IR or SHIFT-DR state, TDO is 
driven to a high impedance state to allow connecting 
TDO of different devices in parallel. 


7.67.2 | | 
TDO is driven on the falling edge of TCK during the 


SHIFT-IR and SHIFT- DR TAP controller states. At 
all other times TDO is driven to the high impedance 


state. 


7.67.3 


TDO is only driven when TMS and TCK have been 
used to select the SHIFT- IR or SHIFT-DR states in 
the TAP controller. 


7.68 TMS 


Test Mode Select 

Controls testing by selecting mode of operation 
Input to the i860 XP CPU Test Signal 

Input to the 82495xXP (pin P2) 

Input to the 82490XP (pin 1) 

Synchronous to TCK 


7.68.1 SIGNAL DESCRIPTION ._. , 


TMS is: decoded by the JTAG TAP (Tap Access 
Port) to select the operation of the test logic, as de- 


‘scribed in chapter 9. 
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To guarantee deterministic behavior of the TAP.con- 
troller TMS is provided with an internal pull-up resis- 
tor. If boundary scan is not used, TMS may be tied 
high or left unconnected. 


7.68.2 WHEN SAMPLED 
TMS is sampled on every rising edge of TCK. 


7.68.3 RELATION TO OTHER SIGNALS 


TMS is used to select the internal TAP states re- 
quired to load peundery scan instructions to data on 
TDI. 


_ For proper initialization of the JTAG logic, TMS 
should be driven high, ‘1’, for at least four TCK cy- 
cles following the rising edge of RESET. 


7.69 VecandVss — 
Power and Ground Pins 


See Tables 1.1 and 1.2 for locations. 


7.70 WWOR# 


Weak Write Ordering Mode 

Enforces strong/weak write-ordering policy 
Input to 82495xXP (pin Q2) Configuration Signal 
Synchronous to CLK - 


7.70.1 SIGNAL DESCRIPTION 


When asserted during reset, the 82495XP enforces 
a weak write ordering policy. If WWOR # is deassert- 
ed during reset, the 82495XP enforces a strong 
write-ordering policy. 


Ina strong write-ordering mode, writes to the memo- 
ry bus are forced to occur in the order in which they 
were. posted by the CPU. In a weak write-ordering 
mode it is possible for: 


1. A CPU posted write (A) to be waiting in a 
~ 82495XP/82490XP memory buffer. 


2. A subsequent CPU write (B) to complete in the 
82495XP/82490XP because it was a hit to M or E 
state. 
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3. A snoop hit to B to cause a write back of B before 
A is written. 


In this scenario, B is written to memory before A bie 
and thus CPU writes have been reordered. 


7.70.2 WHEN SAMPLED 


WWOR # is sampled during reset like eileure 7-1 with | 
a setup time of 4 CPU clocks. WWOR# becomes 
MALE once FSIOUT # indicates that the 82495XP 
reset eedvelice has completed. 


7.70.3 RELATION TO OTHER SIGNALS 


WWOR # shares a pin with MALE: 


8.0 BUS FUNCTIONAL DESCRIPTION 


AND TIMING 


The 82495XP/82490XP cache core supports a wide 
variety of bus transfers to meet the needs of high 
performance systems. Bus transfers can be single 
cycle or multiple cycle, cacheable or non- -cacheable, 
64- or 128-bit (memory bus), and locked. To support 
multiprocessing systems there are cache back-inval- 
idation, inquire, snooping, read for ownership, cache 


_ to cache transfers, and locked cycles. 


This section begins with read cycles, both cacheable 
and non-cacheable. It moves on to write cycles, - 
cacheable and non-cacheable. Snooping cycles are 


‘discussed next with an example of each snooping 


mode. The remaining sections describe special cy- 
cles: read for ownership, I/O, and locked cycles. 


The cycles shown in this chapter are examples of 
various types of 82495XP/82490XP cycles. The pur- 
pose of these examples is. to show signal relation- 
ships, and are not necessarily best case scenarios. 


8.1 Read Cycles 


8. 1. 1: READ HITS 


Read Hit cycles are secuied completely within the 
CPU/Cache core, and will not be seen by the MBC: 
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Figure 8-1. Cacheable Read Miss with Clean Replacement 
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.. a 
intel. 
8.1.2 CACHEABLE READ MISSES 


8.1.2.1 Read Miss with Clean Replacement 
Figure 8.1 illustrates CPU initiated Read cycles that 


miss the 82495XP/82490XP cache and replace a — 


non-dirty (eg. clean or empty) line in the cache. In 
such cycles, the 82495XP will instruct the MBC to 
perform a cache line-fill cycle on the memory bus. A 
cache line-fill is a read of a complete 
82495XP/82490XP line from main memory. The line 
is then written into the 82490XP’s array, and data 
transferred to the CPU as requested. If the line 
fetched from main memory” replaces. a 
82495XP/82490XP cache line which is in valid un- 
modified state ([E] or [S]), then a back-invalidation 


_ cycle is performed on the CPU bus to guarantee that | 


the replaced data is also removed from the CPU’s 


first level cache, thus maintaining the inclusion prop- | 


erty. 
CACHE CONTROL SIGNALS: 
The CPU initiates the read cycle to the 


82495XP/82490XP cache where the cache tag 


state is looked up. Once the 82495XP determines | 
the cycle to be a cache miss, it issues CADS# — 
(clock 2) and the associated cycle control signals to — 


the MBC (eg. CW/R#, CM/IO#, CD/C#, RDYSRC, 


-_ MCACHE #) in order to schedule the cache line-fill . 


operation. MCACHE # is active, indicating that the 
‘read miss is potentially cacheable by the 82495xXP; 


RDYSRC is active, indicating that the MBC must | 


supply BRDY #s to the CPU cache core. 


The memory. bus- address (MSET[10:0], 


MTAG[11:0], 


and remain valid until after CNA# is sampled active 
by the 82495XP (clocks 5 and 16). MALE and MBA- 
LE may be used to hold the padres? as Reseesaly, 


The MBC arbitrates for the memory bus and returns 
BGT # asserted (clock 3), indicating that the cycle is 
guaranteed to complete on the memory bus. Once 
the 82495XP samples BGT # asserted, it must finish 


that cycle on the memory bus. Prior to this point, the » 


cycle can be aborted by a snoop hit in the cache. 


CNA# is asserted by the MBC (clock 4) to indicate 
that it is ready to schedule a new memory bus cycle. 
Note that after CNA# activation, cycle control sig- 
nals are not guaranteed to be valid. 


When the MBC has determined the cacheability at- 
tribute of the cycle, it drives the MKEN# signal ac- 
cordingly. The MBC also drives the KWEND # signal 


MCFAI6:0]) is valid with CADS# 
(clocks 2 and 13 for the two cycles in this example) : 
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at this time, indicating the end of the cacheability 
window. The 82495XP samples MKEN# during 
KWEND# (clock 5) to determine that the cycle is 
indeed cacheable. 


The MBC asserts SWEND# when ii snoop win- 
dow ends on the memory bus. The 82495XP sam- 


ples MWB/WT# and DRCTM# during SWEND# 


(clock 7) and updates the cache tag state according 
to the consistency protocol. The closure of the 
snoop window also enables the MBC to start provid- 
ing the CPU with data that has been stored in the 
82490XP’s memory cycle buffer. The MBC supplies: 
BRDY #s to the CPU (clocks 7-10). 


_ The first cycle ends when CRDY # is driven active 


by the MBC (clock 10). It is at this time that the data 
in the 82490XP’s memory cycle buffers is loaded 
into the cache SRAM. 


The 82495XP issues a new CADS# in clock 13, 


which also misses the 82495XP/82490XP cache. 
Note that once the cycle progress signals (BGT#, 
CNA#, KWEND#, SWEND#) of a cycle are sam- 
pled asserted, the 82495XP ignores them until the 
CRDY # of that cycle. The 82495XP does not pipe- 


line the cycle progress signals (ie. it will not sample 


them again until after CRDY # of the current memory 
bus cycle). | 


~ MEMORY BUS SIGNALS: 


The memory address latch enables (MALE and 


MBALE) may remain asserted by the MBC to place 
the address latches in flow through mode. If the 
82495xXP is the current bus master, the memory ad- 
dress output enables (MAOE# and MBAOE#). 
should be asserted by the MBC. MDOE# must be 
inactive to. allow the data pins to be used as inputs. 


Some time after the address has been driven onto 


the memory bus, data will be supplied from the 


DRAM (main memory) to the 82490XP cache 


_ SRAM. 


For Clocked Memory Bus Mode, MSEL# is driven 
active by the MBC (clock 4) to allow sampling of 
MBRDY# and to latch MZBT# for the transfer. 
MZBT# is sampled on all MCLK edges where 
MSEL # is inactive. Once MSEL# is sampled active 
by the 82495xXP, the value of MZBT# sampled on 
the prior MCLK is used for the next transfer. 
MBRDY # is driven active by the MBC in clocks 4 to 
6 to cause the memory burst counter to be incre- 
mented and data to be placed into the 82490XP 
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cache memory cycle buffers. The MBC drives 
MEOC# asserted (clock 7) to end the current cycle 
on the memory bus and switch memory cycle buffers 
for the new cycle. MZBT# is latched at this time 
(when MEOC # is sampled asserted and MSEL# re- 
mains low) for the next transfer. 


MBRDY # is driven active by the MBC in clocks 15 
to 17 to read data into the 82490XP cache memory 
cycle buffers. The MBC asserts MEOC# (clock 18) 
to end the second read miss cycle on the memory 
bus and switch the memory cycle buffers for a new 
cycle. 


For Strobed Memory Bus Mode, MSEL# is driven 
active by the MBC (clock 4) to allow MISTB opera- 
tion and to latch MZBT# (on the falling edge of 
MSEL #) for the transfer. MISTB is toggled in clocks 
5 to 7 to cause the memory burst counter to be in- 
cremented, and data to be placed into the 82490XP 
cache memory cycle buffers. Note: MISTB latches 
the memory bus data on both the rising and falling 
edges. The MBC drives MEOC# asserted (clock 8) 
to end the current cycle on the memory bus: and 
switch memory cycle buffers for the new cycle. 
MZBT # for the next cycle, is sampled at this time on 
the falling edge of MEOC#. 


MISTB is toggled by the MBC (clocks 15 to 17) to 


read data into the 82490XP memory cycle buffers. 


The MBC asserts MEOC # (clock 18) to end the sec- | 


ond read miss cycle on the memory bus and switch 
the memory cycle buffers for a new cycle. 


8.1.2.2 Read Miss with Replacement of Dirty 
Line 


Figure 8.2 illustrates a CPU read cycle which misses 
the 82495XP cache, and requires the replacement 
of a modified line (eg. tag replacement, lines/ 
sector=1 line ratio=1). In such cycles, the 
82495xXP will instruct the MBC to perform a cache 
line-fill on the memory bus, instruct the 82490XP to 
fill its write-back buffer with the contents of the array 
location corresponding to the line which must be re- 
placed, and perform a back invalidation to the CPU 
to maintain the first and second level cache consist- 


ency. Once the cache line-fill has completed, the — 


82495XP/82490XP will write back the contents of 
the replaced line to main memory from the 82490XP 
write- back buffer. 


CACHE CONTROL SIGNALS: 


The CPU initiates the read cycle to the 
82495XP/82490XP cache where the cache tag 
state is looked up. Once the 82495XP determines 
the cycle to be a cache miss, it issues CADS# 


(clock 1):and the associated cycle control signals to, 


2-331 


82495XP Cache Controller/82490XP Cache RAM PRELIMINARY 


the MBC (eg. CW/R#, CM/IO#, CD/C#, RDYSRC, 
MCACHE #) in order to schedule the cache line-fill 
operation. MCACHE # is active, indicating that the 
read miss is potentially cacheable by the 82495xP; 
RDYSRC is active, indicating that the MBC must 
supply BRDY #s to the CPU cache core. 


The memory bus~ address (MSET[10:0], 
MTAG[11:0], MCFA[6:0]) is valid with CADS# 
(clocks 1 and 5 for the two cycles in this example) 
and remain valid until after CNA# is sampled active 
by the 82495XP (clocks 4 and 10). MALE and MBA-. 
LE may be used to hold the address as necessary. 


The MBC arbitrates for the memory bus and returns 
BGT # asserted (clock 2), indicating that the cycle is | 


guaranteed to complete on the memory bus. At this 


point, the 82490XP’s write-back buffer is prefilled 
with the line to be replaced. Once the 82495XP sam- 
ples BGT # asserted, it must finish that cycle on the 
memory bus. Prior to this point, the cycle can be 
aborted by a snoop hit from another cache. 


CNA# is asserted by the MBC (clock 3) to indicate 
that it is ready to schedule a new memory bus cycle. 
Note that after CNA# activation, cycle control sig- 
nals are not guaranteed to be valid. 


When the MBC has determined the cacheability at- 
tribute of the cycle, it drives the MKEN# signal ac- 
cordingly. The MBC also drives the KWEND# signal 
at this time, indicating the end of the cacheability 
window. The 82495XP samples MKEN# during 
KWEND# (clock 4) to determine that the cycle is 
indeed cacheable. 


The MBC asserts SWEND# (clock 6) when the 
snoop window ends on the memory bus. The clo- 
sure of the snoop window enables the MBC to start 
providing the CPU with data that has been stored in 
the 82490XP’s memory cycle buffer. The MBC sup- 
plies BRDY #s to the CPU (clocks 6-9) to serve the 
read cycle. Note that data may be supplied to the 
82490XP’s immediately after MSEL# activation, and 


_ need not wait for SWEND#. 


On the memory bus, the 82495XP. issues a write- 
back (WB) cycle. CNA# is sampled active in clock 3 
causing the 82495XP to issue the CADS# (also 
CDTS#) of the write-back (clock 5). The MBC 
knows this is a write back cycle and not a CPU initia- 
ted write cycle by sampling MCACHE# asserted. 
This tells the MBC how many data transfers are nec- 
essary. 


BGT #, CNA#, and KWEND # of the write-back are 
sampled asserted by the MBC (clock 9) after the 
CRDY# of the read miss cycle (clock 8). At this 
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Figure 8-2. Cacheable Read Miss with Replacement of Dirty Line 
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point, the 82495XP may issue another CADS# fora 
new (unrelated) memory bus cycle. It is at this time 
that the data in the 82490XP’s memory cycle buffers 
is loaded into the cache SRAM. The data to be writ- 


ten back to main memory is in the 82490XP’s write 


back buffers. 


The snoop window for the write back cycle is closed 
by the MBC in clock 11, and the cycle is ended by 
CRDY # sampled asserted in clock 13. 


MEMORY BUS SIGNALS: 


The memory address latch enables (MALE and 
MBALE) may remain asserted by the MBC to place 
the address latches in flow through mode. If the 
82495xP is the current bus master, the memory ad- 
dress output enables (MAOE# and MBAOE#) 
should be asserted by the MBC. 


Some time after the address has been driven onto 
the memory bus, data will be supplied from the 
DRAM (main memory) to the 82490XP cache 
SRAM. | 


For Clocked Memory Bus Mode, MSEL# is driven 
active by the MBC (clock 3) to allow sampling of 
MBRDY# and to latch MZBT# for the transfer. 
MZBT# is sampled on all MCLK edges where 
MSEL # is inactive. Once MSEL# is sampled active 
by the 82495xXP, the value of MZBT# sampled on 
the prior MCLK is used for the next transfer. 
MBRDY # is driven active by the MBC in clocks 3 to 
5 to cause the memory burst counter to be incre- 
mented and data to be placed into the 82490XP 
cache memory cycle buffers. The MBC drives 
MEOC# asserted (clock 6) to end the current cycle 
on the memory bus and switch memory cycle buffers 
for the new cycle. MZBT# is latched at this time 
(when MEOC# is sampled asserted) for the next 
transfer. 


The MBC asserts the memory data output enable 
signal (MDOE#, clock 8) to drive the memory data 
Outputs. . | 


MBRDY # is driven active by the MBC in clocks 10 
to 12 to write data from the 82490XP cache memory 
cycle buffers onto the memory bus. The MBC as- 
serts MEOC# (clock13) to end the write back cycle 
on the memory bus and switch the memory cycle 
buffers for a new cycle. 


For Strobed Memory Bus Mode, MSEL# is driven 
active by the MBC (clock 4) to allow MISTB opera- 
tion and to latch MZBT# for the transfer (on 
MSEL# falling edge). MISTB is toggled in clocks 5 
to 7 to cause the memory burst counter to be incre- 
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mented, and data to be placed into the 82490XP 


cachememory cycle buffers. Note: MISTB latches 
the memory bus data on both the rising and falling 
edges. The MBC drives MEOC# asserted (clock 8) 
to end the current cycle on the memory bus and 
switch memory cycle buffers for the new cycle. 
MZBT # for the next cycle, is latched at this time on 
the falling edge of MEOC#. 


The MBC asserts MDOE# (clock 9) to drive the 
memory data outputs. 


MOSTB is toggled by the MBC (clocks 10 to 12) to 
write data from the 82490XP memory cycle buffers 
onto the memory bus. The MBC asserts MEOC# 
(clock 13) to end the write back cycle on the memo- 
ry bus and switch the memory cycle buffers for a 
new cycle. 


8.1.3 NON-CACHEABLE READ MISSES 


8.1.3.1 Read Misses not Cacheable by CPU/ 
Cache Core and Cacheable by Core, but 
not by Memory Bus 


Figure 8.3 illustrates two CPU read cycles which 
miss the 82495XP cache, and are non-cacheable. In 


- the first cycle, the CPU/Cache core forces the read 


to be non-cacheable (as indicated by the 
MCACHE# output from the 82495XP). In the sec- 
ond cycle, non-cacheability of the data is forced by 
the memory bus (as indicated by the MKEN# input 
from the MBC). Since both cycles are not cache- 
able, there is no line-fill operation performed, the cy- 
cles are merely echoed to the memory bus. 


CACHE CONTROL SIGNALS: 


The CPU initiates the first read cycle to the 
82495XP/82490XP cache where the cache tag 
state is looked up. Once the 82495XP determines 
the cycle to be a cache miss, it issues a cycle re- 
quest (CADS# in clock 1) and the associated cycle 
control signals to the MBC (eg. CW/R#, CM/IO#, 
CD/C#, RDYSRC, MCACHE #) in order to schedule 
the read operation. RDYSRC is active, indicating 
that the MBC must provide BRDY# to the CPU; 
MCACHE# is not active, indicating that the read 


“miss in not cacheable by the CPU/Cache core. 


The memory bus- address (MSET[10:0], 
MTAG[11:0], MCFA[6:0]) is valid with CADS# 
(clocks 1 and 5 for the two cycles in this example) 
and remain valid until after CNA# is sampled active 
by the 82495xXP (clocks 4 and 10). MALE and MBA- 
LE may be used to hold the address as necessary. 
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Figure 8-3. Non-Cacheable Read Misses 
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The MBC arbitrates for the memory bus and returns 
BGT # asserted (clock 2), indicating that the cycle is 
guaranteed to complete on the memory bus. Once 
the 82495XP samples BGT # asserted, it must finish 
that cycle on the memory bus. Prior to this point, the 


cycle can be aborted by a snoop hit from another - 


cache. . 


CNA# is asserted by the MBC (clock3) to indicate — 


that it is ready to schedule a new memory bus cycle. 
Note that after CNA# activation, cycle control sig- 
nals are not guaranteed to be valid. 


This cycle has already been determined to be non- 
cacheable; therefore, The MBC does not need to 
assert SWEND#, KWEND#, or MKEN# to the 
82495XP/82490XP cache. The MBC supplies 
BRDY# to the CPU to complete the cycle to the 
CPU. The MBC asserts CRDY (clock 8) to the 
82495XP/82490xXP to complete the read miss cycle 
on the memory bus. 


The 82495xXP issues a new (unrelated) cycle request 
(CADS# in clock 5) which also misses the 
82495XP/82490XP cache. Since the 82495XP has 
already sampled CNA# asserted, it issues a new 
CADS # prior to receiving CRDY # of the current cy- 
cle (ie. this cycle is pipelined within the MBC). Note 
that once the cycle progress signals of a cycle are 
sampled asserted, the 82495XP ignores them until 
the CRDY# of that cycle. The 82495XP will not 
sample the cycle progress signals again until after 
the CRDY # of the current memory bus cycle. The 
current read cycle is completed on the bus in clock 8 
with CRDY # assertion. 


The cycle progress signals for the second read miss 
are also valid at this time (clock 5). RDYSRC is ac- 
tive, indicating that the MBC must provide BRDY #s 
to the CPU/Cache core; and MCACHE # is active, 
indicating that the read miss is potentially cacheable 
by the 82495XP/82490xXP. | 


. The MBC issues BGT # and CNA# to the 82495XP 


in clock 9 to indicate that the cycle is guaranteed to 
complete on the memory bus, and that it is ready to 
schedule a new memory bus cycle. KWEND # is as- 
serted at this time to close the cacheability window. 
MKEN # is not active, indicating to the 82495XP that 
the read miss cycle is not cacheable by the memory 
bus. KWEND# and MKEN# must be returned to the 
82495XP at least two clocks prior to BRDY # to in- 
form the CPU that a line fill will not follow. 


The MBC asserts SWEND # (clock 11) to close the 
snoop window, and CRDY # (clock 13) to complete 
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the cycle to the 82495XP/82490XP. Note: 
SWEND # is not needed since the cycle was not 
cacheable. 


NOTE: 
Both examples show single cycle read requests. 


MEMORY BUS SIGNALS: 


The memory address latch enables (MALE and 
MBALE) may remain asserted by the MBC to place 
the address latches in flow through mode. If the 
82495XP is the current bus master, the memory ad- 
dress output enables (MAOE# and MBAOE#) 
should be asserted by the MBC. The memory data 
output enable (MDOE#) must be inactive to allow 
the data pins to be used as inputs. 


Some time after the address has been driven onto 
the memory bus, data will be supplied from the 
DRAM (main memory) to the 82490XP memory cy- 
cle buffers. 


For Clocked Memory Bus Mode, MEOC # is assert- 
ed by the MBC (clock 6) to latch MZBT# for the 
next transfer, and end the current cycle on the mem- 
ory bus (MBRDY# and MSEL# are not necessary 
since this example shows a single transfer cycle). 
MZBT # is driven high by the MBC in order to force 
the read cycle to begin with a non-zero burst ad- 
dress. | | 


For the second non-cacheable read cycle, MSEL# 


is driven active by the MBC (clock 8) to allow sam- 
pling of MBRDY # and to latch MZBT # for the trans- 
fer. MZBT# is sampled on all MCLK edges where 
MSEL # is inactive. Once MSEL# is sampled active 
by the 82495XP, the value of MZBT# sampled on 
the prior MCLK is used for the next transfer. Again, 
MZBT # is driven high by the MBC to force the trans- 
fer to begin with the correct burst address. 
MBRDY # is driven active by the MBC in clock 10 to 
cause the memory burst counter to be incremented 
and data to be placed into the 82490XP cache mem- 
ory cycle buffers. The MBC drives MEOC# asserted 
(clock 12) to end the current cycle on the memory 
bus and switch memory cycle buffers for the new 
cycle. : 


For Strobed Memory Bus Mode, MEOC# is driven 
active by the MBC (clock 5) to latch MZBT # for the 
transfer (on MEOC# falling edge), and end the cur- 
rent cycle on the memory bus (MISTB is not neces- 
sary since this example shows a single transfer cy- 
cle). MZBT # is driven high by the MBC in order to 
force the read cycle to begin with the correct burst 
address. 
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Figure 8-4. Write Hit to [S] State Line (Write-Through) 
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For the second non-cacheable read cycle, MSEL# 
is driven active by the MBC (clock 8) to allow MISTB 
operation and to latch MZBT# for the transfer (on 
MSEL*#¥ falling edge). Again, MZBT# is driven high 
by the MBC to force the transfer to begin with the 
correct burst address. MISTB is toggled in clock 9 to 
cause the memory burst counter to be incremented, 
and data to be placed into the 82490XP cache mem- 
ory cycle buffers. Note: MISTB latches the memory 
bus data on both the rising and falling edges. The 
MBC drives MEOC# asserted (clock 13) to end the 
current cycle on the memory bus and switch memo- 
_ry cycle buffers for the new cycle. MZBT# for the 
next cycle (not shown), is sampled at this time on 
the falling edge of MEOC#. 


8.2 Write Cycles 
8.2.1 WRITE HITS 


8.2.1.1 Write Hit to [E] or [M] States 


CPU initiated write cycles which hit 82495XP entries 
tagged in the [E] or [M] states are executed com- 
pletely within the CPU/Cache core, and will not be 
seen by the MBC. 


8.2.1.2 Write Hit to [S] State 


Figure 8.4 illustrates CPU initiated write cycles which 
hit lines in the 82495XP/82490XP cache array that 
are in the shared state. If the 82495XP/82490XP is 
used as a write through cache (not write back), the 
[S] state is the only state a cached line could be in. 
These cycles are posted as are all normal write cy- 
cles (as long as no other write miss is pending). 


CACHE CONTROL SIGNALS: 
The CPU initiates the write cycle to the 


82495XP/82490XP cache where the cache tag 
state is looked up. Once the 82495XP determines 


the cycle to be a hit to shared state, it posts the write. 


and returns BRDY# to the CPU. 


The 82495XP next issues a cycle request (CADS # 
in clock 1), and the associated cycle control signals 
to the MBC (eg. CW/R#, CM/IO#, CD/C#, 
RDYSRC, MCACHE #, PALLC#) in order to sched- 
ule the write through operation. MCACHE# is not 
active since the write will be posted; RDYSRC is not 
active, indicating that the 82495XP will supply 
BRDY # to the CPU; PALLC# is not active, indicat- 
ing that an allocation cycle will not be performed 


82495XP Cache Controller/82490XP Cache RAM 


PRELIMINARY 


(regardless of MKEN# state) since the line is al- 
ready available in the cache. The MBC must also 
latch PWT and PCD on BLE# falling edge in order 
to track hits and misses to the [S] state. This is how 
an external state tracker can track the [S] state. 


The memory bus~7~ address (MSET[10:0], 
MTAG[11:0], MCFA[6:0]) is valid with CADS# 
(clocks 1 and 6 for the two cycles in this example) 
and remains valid until after CNA# is sampled active 
by the 82495xXP (clocks 4 and 9). MALE and MBALE 
may be used to hold the address as necessary. 


The MBC arbitrates for the memory bus and returns 
BGT # asserted (clock 2), indicating that the cycle is 
guaranteed to complete on the memory bus. Once 
the 82495XP samples BGT # asserted, it must finish 
that cycle on the memory bus. Prior to this point, the 
cycle can be aborted by a snoop hit from another 
cache. 


CNA# is asserted by the MBC (clock 3) to indicate 
that it is ready to schedule a new memory bus cycle. 
Note that after CNA# activation, cycle control sig- 
nals are not guaranteed to be valid. KWEND# is 
also driven at this time since the cacheability of this 
cycle is already known and MKEN # ts a.don’t care. 
It is not necessary that KWEND # be asserted at this _ 
time. 


The 82495XP provides BRDY# to the CPU since 
the cycles are posted writes. The MBC completes 
the first write hit to [S] state in clock 5 when it as- 
serts CRDY # to the 82495XP/82490XP cache. The 
data is latched in to the 82490XP array from the 
memory cycle buffer at this time. 


In this example, the 82495XP issues a second write 
to [S] state in clock 6. For this cycle, the 82495XP 
issues the memory bus request (CADS#) as soon 
as it can after sampling CNA# asserted. The 
82495xXP will not wait for KWEND# (if it does not 
get asserted immediately as in this example) to is- 
sue CADS# since this is not a potential allocate cy- 
cle (ie. PALLC# active). 


The MBC asserts BGT #, CNA#, and KWEND# to- 
gether in clock 8 to indicate that the current cycle is 
guaranteed to complete and the 82495xXP is nee to 
schedule a new memory bus cycle. 


Again, the 82495XP provides BRDY# to the CPU 
since the cycles are posted writes. The MBC com- 
pletes the second write hit to [S] state in clock 12 
when it asserts CRDY# to the 82495XP/82490XP 
cache. The data is latched in to the 82490XP array 
from the memory cycle buffer at this time. 
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MEMORY BUS SIGNALS: 


The memory address latch enables (MALE and 
MBALE) may remain asserted by the MBC to place 
the address latches in flow through mode. If the 
82495xXP is the current bus master, the memory ad- 
dress output enables (MAOE# and MBAOE#) 
should be asserted by the MBC. 


For Clocked Memory Bus Mode, the memory data 
output enable signal (MDOE#) is asserted by the 
MBC in clock 2 to drive the memory data outputs. 


MEOC# is asserted by the MBC (clock 4) to latch 
MZBT # for the transfer, and end the current cycle 
on the memory bus (MBRDY# is not necessary 
since this example shows a single transfer cycle). 
MZBT # is driven high by the MBC in order to force 
the write cycle to begin with the correct burst ad- 
dress . MFRZ# is sampled here (it need not be ac- 
tive since the cycle is not potentially allocatable). 


For the second write through cycle, MSEL # is driv- 
en active by the MBC (clock 7) to allow sampling of 
MBRDY# and to latch MZBT# for the transfer. 
MZBT# is sampled on all MCLK edges where 
MSEL # is inactive. Once MSEL# is sampled active 
by the 82495xXP, the value of MZBT# sampled on 
the prior MCLK is used for the next transfer. Again, 
MZBT # is driven high by the MBC to force the trans- 
fer to begin with the. correct burst: address. 
MBRDY # is driven active by the MBC in clock 10 to 
cause the memory burst counter to be incremented 
and data to be placed into the 82490XP cache mem- 
ory cycle buffers. The MBC drives MEOC # asserted 
(clock 12) to end the current cycle on the memory 
bus and switch memory Nae buffers for the new 
nee : 3 


For Strobed Memory Bus Mode, the Yedtnen dat 
output enable (MDOE#) is asserted by the MBC in 
clock 2 to drive the memory data outputs. 


MEOC# is driven active by the MBC (clock 4) to 
latch MZBT# for the transfer (on MEOC# falling 
edge), and end the current cycle on the memory bus 
(MOSTB is not necessary since this example shows 
a single transfer cycle). MZBT # is driven high by the 


_ MBC in order to force the read cycle to begin with - 


the correct burst address. 


For the second write through cycle, MSEL# is driv- 
en active by the MBC (clock 6) to allow MOSTB op- 
eration and to latch MZBT# for the transfer (on 
MSEL# falling edge). Again, MZBT# is driven high 
by the MBC to force the transfer to begin with the 
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correct burst address. MOSTB is toggled in clock 9 
to cause the memory burst counter to be increment- 
ed, and data to be placed into the 82490XP cache 
memory cycle buffers. Note: MOSTB latches the 
memory bus data on both the rising and falling edg- 
es. The MBC drives MEOC# asserted (clock 11) to 
end the current cycle on the memory bus and switch 
memory cycle buffers for the new cycle. MZBT # for 
the next cycle (not shown), is sampled at this time 
on the falling edge of MEOC#. 


8.2.2 WRITE MISSES 


8.2.2.1 Write Miss with no Allocation 


Figure 8.5 illustrates two CPU initiated write cycles 
which miss the 82495XP/82490XP cache and are 


_not allocatable. The first write cycle begins as a po- 


tentially allocatable cycle, but MKEN# sampled in- 
active indicates that the cycle is not cacheable by 
the memory bus. The second write miss cycle is not 
cacheable by the CPU/82495XP/82490XP as indi- 
cated by the PALLC # output from the 82495xP. 


CACHE CONTROL SIGNALS: 


The CPU initiates the first write’ cycle to the 


- 82495XP/82490XP cache where the cache tag . 
‘state is looked up. Once the 82495XP determines — 


the cycle to be a cache miss. It issues a cycle re- 
quest (CADS # in clock.1) and the associated cycle 


control signals to the MBC (eg. CW/R#, CM/IO#, 


CD/C#, RDYSRC, MCACHE#, PALLC#) in order 
to schedule the write miss operation. RDYSRC is not 
active, indicating that the 82495XP will supply 
BRDY# to the CPU; MCACHE# is not active; 
PALLC# is active, indicating that the cycle is poten- 
tially allocatable. 


The write miss data is posted in the 82490XP’s 
memory cycle buffer, and the cycle completes with 
no wait states to the CPU. The CPU is then free to 
issue another (non-related) cycle while the 82495XP 
completes the current write miss cycle and possible 
allocation. If this new cycle is a cache hit, it will be 
serviced by the 82495XP immediately; but if it is a 
cache miss, its service will wait until the CRDY # of 
the write cycle (and allocation cycle, if executed). 


The memory bus~ address (MSET[10:0], 
MTAG[11:0], MCFA[6:0]) is valid with CADS# 
(clocks 1 and 7 for the two cycles in this example) 
and remain valid until after CNA# is sampled active 
by the 82495xXP (clocks 4 and 10). MALE and MBA- 


_ LE may be used to hold the address as necessary. 
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Figure 8-5. Write Miss with No Allocation 
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The MBC arbitrates for the memory bus and returns 
BGT# asserted (clock 2), indicating that the write 
through cycle is guaranteed to complete on the 
memory bus. Once the 82495XP samples BGT # as- 
-serted, it must finish that cycle on the memory bus. 
Prior to this point, the cycle can be aborted by a 
snoop hit from another cache. 


CNA# is asserted by the MBC (clock 3) to indicate 


that it is ready to schedule a new memory bus cycle. 


Notice that the cycle control signals are not guaran- 


teed to be valid after CNA# activation. NOTE that 
-CNA# has no effect before KWEND #. 


When the MBC has determined the cacheability at- 
tribute of the write through cycle, it drives the 


MKEN# signal accordingly. The MBC also drives 


the KWEND# signal at this time (clock 4), indicating 
.the end of the cacheability window. The 82495XP 


samples MKEN # inactive during KWEND #, indicat- 


soe shane a ot ~ 


ing inat tne missed cycie is not cacheabie and 
should not be allocated. 


‘The MBC asserts SWEND# (clock 6) when the 
snoop window of the write through cycle ends on the 
‘memory bus. The MBC may return CRDY# to the 


82495XP/82490XP cache any time after the closure 


of the snoop window. In this example, CRDY# is 
issued by the MBC in clock 8. 


‘The 82495xXP issues a cycle request for the second 
write miss cycle in clock 7. The cycle control signals 
are valid at this time. Note that PALLC# is inactive, 
indicating that the 82495XP/82490XP has deter- 
mined the cycle to not be allocatable. | 


The MBC# asserts BGT #, CNA#, and KWEND # in 
clock 9. MKEN# is a don’t care during the cachea- 
bility window since the cycle is not allocatable. The 
snoop window is closed in clock 11, and the cycle is 
completed on the memory bus in clock 13 with the 
assertion of CRDY# by. the MBC. 


MEMORY BUS SIGNALS: 


The memory address latch enables (MALE and 
MBALE) may remain asserted by the MBC to place 
the address latches in flow through mode. If the 
82495xXP is the current bus master, the memory ad- 
dress output enables (MAOE# and NEDQEE! 
should be asserted by the MBC. — 


For Clocked Memory Bus Mode, the memory data 
output enable (MDOE #) is asserted by the MBC in 
clock 4 to drive the memory data outputs. 
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MEOC# is asserted by the MBC (clock 5) to latch 
MZBT# for the transfer, and end the current cycle 
on the memory bus (MBRDY# is not necessary 
since this example shows a single transfer cycle). 
MZBT # is driven high by the MBC in order to force 
the read cycle to begin with the correct burst ad- 
dress. MFRZ# is sampled here (it need not be ac- 
tive since the cycle is not potentially allocatable). 


For the second non allocatable write cycle, MSEL# 
is driven active by the MBC (clock 8) to allow sam- 
pling of MBRDY # and to latch MZBT # for the trans- 


fer. MZBT# is sampled on all MCLK edges where 
MSEL# is inactive. Once MSEL# is sampled active 
— by the 82495xXP, the value of MZBT # sampled on 


the prior MCLK is used for the next transfer. Again, 
MZBT # is driven high by the MBC to force the trans- 
fer to begin with the correct burst address. 
MBRDY # is driven active by the MBC in clock 10 to 
cause the memory burst counter to be incremented 
and daia to be piaced into the 82490XP cache mem- 
ory cycle buffers. 


The MBC drives MEOC # asserted (clock 13) to end 
the current cycle on the memory bus and switch 


-memory cycle buffers for the new cycle. MFRZ# is 


sampled here (it need not be active since the cycle 
is not potentially allocatable). MZBT# is also sam- 
pled at this time. 


For Strobed Memory Bus Mode, the memory data 
output enable (MDOE #) is asserted by the MBC in 


~ Clock 2 to drive the memory data outputs. 


MEOC # is driven active by the MBC (clock 5) to 
latch MZBT# for the transfer, and end the current 
cycle on the memory bus (MOSTB is not necessary 
since this example shows a single transfer cycle). 
MZBT # is driven high by the MBC in order to force 
the read cycle to begin with the correct burst ad- 
dress. 


For the second write through cycle, MSEL # is driv- 
en active by the. MBC (clock 8) to allow MOSTB op- 
eration and to latch MZBT# for the transfer. Again, 
MZBT # is driven high by the MBC to force the trans- 
fer to begin with the correct burst address. MOSTB 
is toggled in clock 12 to cause the memory burst 
counter to be incremented, and data to be read from — 
the 82490XP cache memory cycle buffers. Note: 
MOSTB latches the memory bus data on both the 
rising and falling edges. 


The MBC drives MEOC# asserted (clock 13) to end 


the current cycle on the memory bus and switch 
memory cycle buffers for the new cycle. MZBT# 
and MFRZ# for the next cycle (not shown), is sam- 


pled at this time on the falling edge of MEOC#. 
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Figure 8-6. Write Miss with Allocation to [M] Line 
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8.2.2.2 Write Miss with Allocation 


Figure. 8.6 illustrates a CPU initiated write cycle 
which misses the 82495XP/82490XP cache and fol- 
lows the write to main memory with an allocation 
cycle. An allocation is when the cache follows a 
write miss cycle with a line fill. This example as- 
sumes that allocating the new line requires the re- 
placement of a modified line (ie. a write-back to main 
memory). 


CACHE CONTROL SIGNALS: 


The CPU initiates the write cycle to the 
82495XP/82490XP cache where the cache tag 
state is looked up. Once the 82495XP determines 
the cycle to be a cache miss, it issues CADS# 
(clock 1) and the associated cycle control signals to 
the MBC (eg. CW/R#, CM/IO#, CD/C#, RDYSRC, 
MCACHE #, PALLC #) in order to schedule the write 
operation. VICACHE # is not active; RDYSAC is noi 
active, indicating that the 82495XP will supply 
BRDY #s to the CPU; PALLC# is asserted, indicat- 
ing a potential allocate. cycle after the write-through 
cycle. 


The write miss data is posted in the 82490XP’s 
memory cycle buffer, and the cycle completes with 
no wait states to the CPU. The CPU is free to issue 
another (non-related) cycle while waiting for the 
82495XP to complete the allocation. If this new cy- 


cle is a cache hit, it will be serviced by the 82495XP_- 


immediately; but if it is a cache miss, its service will 
wait until the CRDY # of the allocation. 


The memory’ bus~ address — (MSET[10:0], 
MTAG[11:0], MCFA[6:0]) is valid with CADS# 
(clocks 1, 5 and 10 for the three cycles in this exam- 
ple) and remain valid until after CNA# is sampled 
active by the 82495xXP (clocks 4, 10 and 15). MALE 


and MBALE may be used to hold the address as 


necessary. 


The MBC arbitrates for the memory bus and returns 
BGT# asserted (clock 2), indicating that the write 
through cycle is guaranteed to complete on the 
memory bus. Once the 82495XP samples BGT # as- 
serted, it must finish that cycle on the memory bus. 
Prior to this point, the cycle can be aborted by a 
snoop hit from another cache. | | 


CNA# is asserted by the MBC (clock 3) to indicate © 


that it is ready to schedule a new memory bus cycle. 
Note that after CNA# activation, cycle control sig- 
nals are not guaranteed to be valid. 


When the MBC has determined the cacheability at- 
tribute of the write through cycle, it drives the 
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MKEN# signal accordingly. The MBC also drives 
the KWEND# signal at this time, indicating the end 
of the cacheability window. The 82495XP samples 
MKEN# active during KWEND# (clock 4), indicat- 
ing that the missed line should be allocated in the 
cache. | 


At the first available time (clock 5), the 82495XP as- 
. serts CADS # to request an allocation cycle. The cy- 


cle control signals are valid at this point: MCACHE # 
is active, indicating the cacheability of the line-fill cy- 
cle; RDYSRC is not active, indicating that the MBC 
need not supply BRDY #s to the CPU (no BRDY#s 
are necessary for an allocation cycle). 


The MBC asserts SWEND# (clock 6) when the 
snoop window of the write through cycle ends on the 


-memory bus. 


The MBC may return CRDY # to the 82495XP/ 
snoop window. In this example, CRDY # is issued by 
the MBC in clock 8. At this time, the cycle progress 
signals for the allocation cycle may be issued by the 
MBC to complete the line fill. 


Once again, the MBC arbitrates for the memory bus ~ 
and returns BGT # asserted (clock 9) for the alloca- 
tion cycle. The MBC also asserts CNA# and . 
KWEND# at this time. The 82495XP back-invali- 
dates the CPU to maintain first and second level 
cache conesteicy. 


In clock 10, the 82495XP asserts CADS# for the 
write back cycle (since the miss was to a dirty line). 
CDTS# is asserted by the 82495XP two clocks later 


~ (clock 12). Note that CDTS # of the write back cycle 


is not asserted with CADS # since the data is not yet 


~ available in the 82490XP’s write-back buffer. 


The MBC asserts SWEND# (clock 11) when the 


snoop window of the allocation ae end on the 
memory bus. 


- At this time, the MBC may assert CRDY# to the 


82495XP/82490XP cache for the allocation cycle. 
CRDY # assertion will cause the data stored in the 
82490XP’s memory cycle buffers to be latched into 


the cache array. 


On the memory bus, BGT#, CNA#, and KWEND# 
are sampled active in clock 14 for the write back 


4 cycle. The snoop window is closed two clocks later 


(clock 16) by the MBC with SWEND #, and the write 
back cycle is completed with CRDY# asserted in 
clock 18.. | 
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MEMORY BUS SIGNALS: 


The memory address latch enables (MALE and 


MBALE) may remain asserted by the MBC to place 


the address latches in flow through mode. If the 
82495XxP is the current bus master, the memory ad- 
dress output enables (MAOE# and MBAOE#) 
should be asserted by the MBC. 


For Clocked Memory Bus Mode, the memory data 
output enable (MDOE #) has been asserted by the 
_ MBC to drive the memory data outputs. 


MEOC# is asserted by the MBC (clock 4) to latch 
MZBT # for the transfer, and end the current cycle 
on the memory bus (MBRDY# is not necessary 
since this example shows a single transfer write 
miss cycle). MZBT# is driven high by the MBC in 
order to force the read cycle to begin with the cor- 
rect burst address. MFRZ# is driven inactive by the 
MBC here, allowing the line to be placed into the 
exclusive ({E]) state and requiring the data to be 
written to main memory. 


For the allocation (line fill) cycle, MSEL# is driven 
active by the MBC (clock 6) to allow sampling of 
MBRDY# and to latch MZBT# for the transfer. 
MZBT# is sampled on all MCLK edges where 
MSEL # is inactive. Once MSEL# is sampled active 
by the 82495XP, the value of MZBT# sampled on 
the prior MCLK is used for the next transfer. 
MDOE # is also deasserted in clock 6 to allow the 
data pins to be used as inputs for the allocation cy- 
cle. 


MBRDY # is driven active by the MBC in clocks 7 to 
9 to cause the memory burst counter to be incre- 
mented and data to be placed into the 82490XP 
cache memory cycle buffers. The MBC drives 
MEOC# asserted (clock 10) to end the allocation 
cycle on the memory bus and switch memory cycle 
buffers for the new cycle. MZBT# is sampled and 
latched at this time for the next data transfer. 


MDOE # is asserted by the MBC (clock 12) to drive 
the memory data outputs for the write back cycle. 


The MBC again asserts MBRDY # (clocks 13 to 15) 
for the write back cycle to increment the memory 
burst counter and cause data to be read from the 
82490XP memory cycle buffers. The write back cy- 
cle ends on the memory bus and switches memory 
cycle buffers with MEOC# assertion (clock 16). 
MZBT # and MFRZ# for the next transfer are sam- 
pled at this time. MFRZ# need not be active since 
the cycle is not potentially allocatable. 


For Strobed Memory Bus Mode, the memory data 
output enable (MDOE#) has been asserted by the 
MBC to drive the memory data outputs for the write 
miss cycle. 
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MEOC# is driven active by the MBC (clock 4) to 
latch MZBT # for the transfer, and end the current 
cycle on the memory bus (MOSTB is not necessary 
since this example shows a single transfer cycle). 
MZBT # is driven high by the MBC in order to force 
the read cycle to begin with the correct burst ad- 
dress. MFRZ# is driven deasserted by the MBC 
here, allowing the line to be placed into the exclu- 
sive ([E]) state. 


For the allocation (line fill) cycle, MSEL# is driven 
active by the MBC (clock 6) to allow MISTB opera- 
tion and to latch MZBT # for the transfer. MISTB is 
toggled in clocks 8 to 10 to cause the memory burst 
counter to be incremented, and data to be placed 
into the 82490XP cache memory cycle buffers. 
Note: MISTB latches the memory bus data on both 
the rising and falling edges. MDOE# is also deas- 
serted in clock 6 to allow the data pins to be used as 
inputs for the allocation cycle. 


The MBC drives MEOC # asserted (clock 11) to end 
the allocation cycle on the memory bus and switch 
memory cycle buffers for the new cycle. MZBT # for 
the next cycle, is latched at this time on the falling 
edge of MEOC#. 


MDOE # is asserted by the MBC (clock 14) to drive 
the memory data outputs for the write back cycle. 


The MBC toggles MOSTB (clocks 15 to 17) for the 
write back cycle to increment the memory burst — 
counter and cause data to be read from the 
82490XP memory cycle buffers. | 


The write back cycle ends on the memory bus and 
switches memory cycle buffers with MEOC# asser- 
tion (clock 18). MZBT# and MFRZ# for the next 
transfer are sampled at this time. MFRZ# need not 
be active since the neg is not potentially allocata- 
ble. 


8.3 Snooping Cycles 


8.3.1 SYNCHRONOUS SNOOPING MODE 
(HIT TO [NM] LINE) 


Figure 8.7 illustrates a snoop hit to a dirty line se- 
quence occurring simultaneously with a CPU initiat- 
ed read miss cycle. This example assumes synchro- 
nous snooping mode (ie. requests for snoops are 
done via SNPSTB# from the MBC, sampled on the | 


82495XP’s CLK). 
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Figure 8-7. Synchronous Snooping Mode 
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CACHE CONTROL SIGNALS: 


In clock 1 SNPSTB# is asserted by the MBC, indi- 
cating to the 82495XP a request for snooping. The 
82495XP samples MAOE# (it must be inactive) in 
order to recognize the snoop request. It is latched 
together with the snoop address (MSET[0:10], 
MTAG([0:11], MCFA[0:6]), SNPINV, MBAOE #, and 
SNPNCA on the 82495XP’s CLK during SNPSTB # 
assertion. The tag look-up is done immediately after 
SNPSTB# is sampled active since snoop opera- 
tions have the highest priority in the cache tag state 
arbiter. The 82495XP issues SNPCYC# (clock 2), 
indicating that the snoop look-up is in progress. The 
results of the look-up are driven to the memory bus 
via MTHIT# and MHITM# in the next clock after 
SNPCYC#. Since the snoop hit a modified line, both 
signals are asserted (clock 3). SNPBSY # is also is- 
sued to indicate that the 82495XP is busy with CPU 
back-invalidations, the 82490XP’s snoop buffer is 
full, or a write back is to follow. The 82495XP will 
accept snoops only when SNPBSY # is inactive. 


Simultaneously with the memory bus activity due to 
the snoop request, the CPU initiates a read miss cy- 
cle. The 82495XP issues a memory bus request 
(CADS #), CDTS#, and cycle control signals to the 
MBC in clock 3. The MBC must wait for the pending 
snoop cycle to complete on the memory bus prior to 
servicing this read miss cycle. 


The memory’ bus’7~ address (MSET[10:0], 
MTAG[11:0], MCFA[6:0]) is not valid until MAOE# 
goes active after CRDY# of the snoop write back 
cycle is sampled active by the 82495XP and the 
CADS # is reissued (clock 13). | 


In clock 4 the 82495XP issues SNPADS# and cycle 
control signals to the MBC, indicating a request to 
flush a modified line out of the cache. SNPADS# 
activation causes the MBC to abort the pending read 
miss cycle. It is the 82495XP responsibility to re-is- 
sue the aborted cycle after the completion of the 
write back, since BGT# was not asserted by the 
MBC. : 


Data is loaded into the 82490XP’s snoop buffer. 
Since SNPINV -was sampled asserted by the 
82495XP (clock 1) during SNPSTB# assertion, it 
back-invalidated the CPUs first level cache. 


The 82495xXP asserts CDTS# (clock 8) indicating to 
the MBC that data is available in the snoop buffer. 
When the MBC complete the write back cycle on the 
memory bus, it activates CRDY# -to the 
82495XP/82490XP .cache. At this time, the 
82495XP deasserts SNPBSY # (clock 13) and re-is- 
sues the aborted read miss cycle (clock 13) by as- 
serting CADS# and CDTS#. 
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MEMORY BUS SIGNALS: 


For Clocked Memory Bus Mode, the memory data 
output enable (MDOE #) is not activated by the MBC 
to allow the memory data pins to be used as inputs. 


MSEL# is driven active by the MBC (clock 4) to al- 
low sampling of MBRDY # and to latch MZBT # for 
the read miss transfer. MZBT# is sampled on all 
MCLK rising edges where MSEL# is inactive. Once 
MSEL# is sampled active by the 82495xP, the val- 
ue of MZBT# sampled on the prior MCLK is used 
for the next transfer. 


Since the read miss cycle is aborted due to the 
snoop hit to a modified line (requires a write back 
cycle), no MEOC# is given. MSEL# is deasserted 
by the MBC (clock 6) and reasserted (clock 8) to 
allow latching of MZBT# for the snoop write back 
cycle and sampling of MBRDY# for that cycle. 
MFRZ# is also sampled at this time. 


The memory data output enable (MDOE #) signal is 
driven active by the MBC (clock 7) to drive the mem- 
ory data outputs. 


MBRDY # is driven active by the MBC in clocks 10 
to 12 to cause the memory burst counter to be incre- 
mented and data to be written from the 82490XP 
cache snoop buffers. The MBC drives MEOC# as- 
serted (clock 13) to end the write back cycle on the 
memory bus and switch memory cycle buffers for 
the new cycle. MZBT# and MFRZ# are sampled 


and latched at this time for the next data transfer. 


MDOE # is deasserted by the MBC (clock 14) to al- 
low the memory data pins to be used as inputs for 
the reissued read cycle. 


For Strobed Memory Bus Mode, the memory data 
output enable (MDOE#) has not been asserted by 
the MBC to allow the memory data pins to be used 
as inputs for the read miss cycle. 


MSEL# is asserted by the MBC (clock 4) to allow 
sampling of MISTB and latch MZBT # (on the falling 
edge of MSEL#) for the read miss transfer. 


Since the read miss cycle is aborted due to the 
snoop hit to a modified line (requires a write back 
cycle), no MEOC# is given. MSEL# is deasserted 
by the MBC (clock 5) and reasserted (clock 6) to 
allow latching of MZBT# for the snoop write back 
cycle and sampling of MOSTB for that cycle. 
MFRZ# is also sampled at this time. - 


MOSTB is toggled in clocks 11 to 13 to cause the 
memory burst counter to be incremented, and data 
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to be read from the 82490XP cache memory cycle 
buffers. Note: MOSTB latches the memory bus data 
on both the rising and falling edges. The MBC drives 
MEOC# asserted (clock 14) to end the snoop write 
back cycle on the memory bus and switch memory 
cycle buffers for the new cycle. MZBT# and 
MFRZ# for the next cycle, are latched at Me time 
on the falling edge of MEOC#. : 


MDOE # is deasserted by the MBC (clock 14) to al- . 


low the memory data pins to be used as inputs a 
the reissued read miss cycle. 


8.3.2 CLOCKED SNOOPING MODE 
Figure 8.8 illustrates a CPU initiated Read cycle 


| _ which misses the 82495XP/82490XP cache and the 


‘subsequent line fill replaces non dirty data (eg. clean 
or empty). Simultaneous with the read request to the 
MBC, that device initiates a snoop to the 82495XP 
which misses that line in the cache. The snoop is the 
result of a write cycle on the memory bus by some 
other cache core; therefore, asserting the snoop in- 
validation signal (SNPINV) to this 82495XP. This ex- 
ample assumes Clocked Snooping Mode (i.e. the re- 


quests for snoops are done via SNPSTB# from the 


MBC, sampled on the MBC’s SNPCLK). 
CACHE CONTROL SIGNALS: 


The CPU initiates the read cycle to the 
82495XP/82490XP cache where the cache tag 
state is looked up. Once the 82495XP determines 
the cycle to be a cache miss, it issues CADS# 


(clock 1) and the associated cycle control signals to 


_ the MBC (eg. CW/R#, CM/IO#, CD/C#, RDYSRC, 
MCACHE #) in order to schedule the cache line-fill 
operation. MCACHE # is active, indicating that the 
read miss in potentially cacheable by the 82495xP; 
RDYSRC is active, indicating that the MBC must 
supply BRDY #s to the CPU cache core. 


In clock 3, SNPSTB# is aeseticd by the MBC at this 
time, indicating to the 82495XP a request for snoop- 
ing. MAOE # is deasserted to allow the forthcoming 
snoop (the 82495XP will not recognize the snoop if 
MAOE# is active). It is latched together with the 
snoop address (MSET[0:10], MTAG[0:11], 
-MCFA[O:6]), SNPINV, MBAOE#, and SNPNCA on 
the MBC’s SNPCLK rising edge during SNPSTB# 


assertion. SNPINV is asserted from the MBC since | 


the cache core which initiated the snoop issued a 
write cycle on the memory bus. If the response of 


the snoop to this 82495XP was a cache hit, the con- 


tents would no longer be valid due that write. 
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Following synchronization to the 82495XP CLK, it 
issues SNPCYC# (clock 5), indicating that the 
snoop look-up is in progress. The results of the look- 
up are driven to the memory bus via MTHIT# and 
MHITM# in the next clock after SNPCYC#. Since 
the snoop was a miss in the cache, both signals are 
inactive (clock 6). Note that SNPBSY# will not be 
asserted since the snoop was a miss to this cache. 
The snoop from another cache is complete at this 
point, and the read miss cycle will continue. 


The MBC asserts MAOE # to allow this 82495xXP to 

drive its address on the memory bus in order to com- 
plete the read miss cycle. The memory bus address 
(MSET[10:0], MTAG[11:0], MCFA[6:0]) is valid after 
MAOE# assertion # (clock 6 for the read cycle in 
this example) and remains valid until after CNA# is 


_ sampled active by the 82495xXP (clock 8). MALE and 


MBALE may be used to hold the address as neces- 
sary. | . 


The MBC arbitrates for the memory bus and returns 
BGT # asserted (clock 6), indicating that the cycle is 
guaranteed to complete on the memory bus. Once 
the 82495XP samples BGT # asserted, it must finish 
that cycle on the memory bus. Prior to this point, the 
cycle can be aborted by a snoop hit from another 
cache. | 


CNA# is asserted the MBC (clock 7) to indicate 
that it is ready to schedule a new memory bus cycle. 
Note that after CNA# activation, cycle control sig- 


nals are not guaranteed to be valid. 


When the MBC has determined the cacheability at- 
tribute of the cycle, it drives the MKEN# signal ac- 
cordingly. The MBC also drives the KWEND# signal 
at this time, indicating the end of the cacheability 
window. The 82495XP samples MKEN# during 
KWEND# (clock 7) to determine that the ce is 
indeed cacheable. | 


The MBC.asserts SWEND# when the snoop win- 
dow ends on the memory bus. The 82495XP sam- 
ples MWB/WT # during SWEND# (clock 9) and up- 
dates the cache tag state according to the consist- 
ency protocol. The closure of the snoop window also 
enables the MBC to start providing the CPU with 
data that has been stored in the 82490XP’s memory 
cycle buffer. The MBC supplies BRDY #s to the CPU 
cocks 9- 12). 


The read miss svole ends when CRDY# i is driven 
active by the MBC (clock 12). It is at this time that 
the data in the 82490XP’s memory cycle buffers is 
loaded into the cache SRAM... -.: 
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Figure 8-8. Clocked Snooping Mode 
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MEMORY BUS SIGNALS: 


The memory address latch enables (MALE and 
MBALE) may remain asserted by the MBC to place 
the address latches in flow through mode. If the 
82495XP is the current bus master, the memory ad- 
dress output enables (MAOE# and MBAOE#) 
should be asserted by the MBC. (Note the use of 
MAOE # for snooping at the beginning.of the cache 
control signals section.) MDOE# must be inactive to 
allow the data pins to be used as inputs. 


Some time after the address has been driven onto 


the memory bus, data will be supplied from the 
DRAM (main memory) to the 82490XP cache 
SRAM. 


For Clocked Memory Bus Mode, MSEL# is driven 
active by the MBC (clock 6) to allow sampling of 
MBRDY# and to latch MZBT# for the transfer. 
MZB17 iS Ssampied on aii MCLK edges where 
MSEL # is inactive. Once MSEL# is sampled active 
by the 82495xXP, the value of MZBT# sampled on 
the prior MCLK is used for the next transfer. 
MBRDY # is driven active by the MBC in clocks 7 to 
9 to cause the memory burst counter to be incre- 
mented and data to be placed into the 82490XP 
cache memory cycle buffers. The MBC drives 
MEOC # asserted (clock 10) to end the current cycle 
on the memory bus and switch memory cycle buffers 
for the new cycle. MZBT# is sampled at this time 
(when MEOC # is sampled asserted and MSEL # re- 
mains low) for the next transfer. 


For Strobed Memory Bus Mode, MSEL# is driven 
active by the MBC (clock 6) to allow MISTB opera- 


tion and to latch MZBT# (on the falling edge of 


MSEL#) for the transfer. MISTB is toggled in clocks 
8 to 10 to cause the memory burst counter to be 
incremented, and data. to be placed into the 
82490XP cache memory cycle buffers. Note: MISTB 
latches the memory bus data on both the rising and 
falling edges. The MBC drives MEOC# asserted 
(clock 11) to end the current cycle on the memory 
bus and switch memory cycle buffers for the new 
cycle. MZBT # for the next cycle, is sampled at this 
time on the falling edge of MEOC#. 


8.3.3 STROBED SNOOPING MODE 
(HIT TO [M] LINE) 


Figure 8.9 illustrates a snoop hit to a dirty line se- 
quence occurring simultaneously with a CPU initiat- 
ed read miss cycle. This example assumes strobed 
snooping mode (ie. requests for Snoops are done 
from the falling edge of ue 
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CACHE CONTROL SIGNALS: 


In clock 1 (totally asynchronous to any clock) 
SNPSTB # is asserted by the MBC, indicating to the 
82495XP a request for snooping. The 82495XP 
samples MAOE# (it must be inactive) in order to | 
recognize the snoop. request. It is latched together 
with the snoop address (MSET[0:10], MTAG[0:11], 


_ MCFA[0:6]), SNPINV, MBAOE#, and SNPNCA on 


falling edge of SNPSTB#. The 82495XP issues 
SNPCYC# (clock 3), indicating that the snoop look- 
up is in progress. The results of the look-up are driv- 
en to the memory bus via MTHIT # and MHITM# in 
the next clock after SNPCYC#. Since the snoop hit 
a modified line, both signals are asserted (clock 4). 
SNPBSY# is also issued to indicate that the 
82495XP is busy with CPU back-invalidations, the 
82490XP’s snoop buffer is full, or a write back is to 
follow. The 82495XP will accept snoops ony) when 
SNPBSY # is inactive. 


Simultaneously with the memory bus activity due to 


_ the snoop request, the CPU initiates a read miss cy- 


cle. The 82495XP issues a memory bus request 
(CADS #), CDTS#, and cycle control signals to the 
MBC in clock 1. The MBC must wait for the pending 
snoop cycle to complete on the memory bus prior to 
servicing this read miss cycle. 


The memory bus~ address (MSET[10:0], 
MTAG[11:0], MCFA[6:0]) is not valid until MAOE# 
goes active after CRDY# of the snoop write back 
cycle is sampled active by the 82495XP and the 
CADS # is reissued (clock 15). 


In clock 5 the 82495XP issues SNPADS# and cycle 
control signals to the MBC, indicating a request to 
flush a modified line out of the cache. SNPADS# 
activation causes the MBC to abort the pending read 
miss cycle. It is the 82495XP responsibility to re-is- 
sue the aborted cycle after the completion of the 
write back, since BGT# was not asserted by the 
MBC. | 


Data is loaded into the 82490XP’s snoop buffer. 
Since SNPINV was sampled asserted by the 
82495XP (clock 1) during SNPSTB# assertion, it 
back-invalidated the CPUs first level cache. 


The 82495xXP asserts CDTS# (clock 9) indicating to 
the MBC that data is available in the snoop buffer. 
When the MBC complete the write back cycle on the 
memory bus, it activates CRDY# to the 
82495XP/82490XP cache. At this time, the 


* §2495XP deasserts SNPBSY # (clock 15) and re-is- 


sues the aborted read miss cycle by asserting © 


~CADS# and CDTS#. 
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Figure 8-9. Strobed Snooping Mode 
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MEMORY BUS SIGNALS: 


For Clocked Memory Bus Mode, the memory data 
output enable (MDOE #) is not activated by the MBC 
to allow the memory data pins to be used as inputs. 


MSEL # is driven active by the MBC (clock 2) to al- 
low sampling of MBRDY# and to latch MZBT# for 
the read miss transfer. MZBT# is sampled on all 
MCLK rising edges where MSEL # is inactive. Once 
MSEL# is sampled active by the 82495XP, the val- 
ue of MZBT# sampled on the prior MCLK is used 
for the next transfer. 


Since the read miss cycle is aborted due to the 
snoop hit to a modified line (requires a write back 
cycle), no MEOC# is given. MSEL# is deasserted 
by the MBC (clock 9) and reasserted (clock 11) to 
allow latching of MZBT# for the snoop write back 
cycle and sampling of MBRDY# for that cycle. 
MERZ # is also saimpie d at tnis time. 

The memory data output enable (MDOE #) signal is 
‘driven active by the MBC (clock 9) to drive the mem- 
ory data outputs. | 


MBRDY # is driven active by the MBC in clocks 11 
to 13 to cause the memory burst counter to be incre- 
mented and data to be written from the 82490XP 
cache memory cycle buffers. The MBC drives 
MEOC# asserted (clock 14) to end the write back 
cycle on the memory bus and switch memory cycle 


buffers for the new cycle. MZBT# and MFRZ# are 


sampled and sampled at this time for the next data 
transfer. 


MDOE # is deasserted by the MBC (clock 16) to al- 


low the memory data pins to be used as inputs for — 


the reissued read cycle. 


For Strobed Memory Bus Mode, the memory data 
output enable (MDOE#) has not been asserted by 
the MBC to allow the memory data pins to be used 
as inputs for the read miss cycle. 


MSEL# is asserted by the MBC (clock 2) to allow 


sampling of MISTB and latch MZBT # (on the falling — 


edge of MSEL #) for the read miss transfer. 


Since the read miss cycle is aborted due to the 
snoop hit to a modified line (requires a write back 
cycle), no MEOC# is given. MSEL# is deasserted 
_ by the MBC (clock 9) and reasserted (clock 11) to 
allow latching of MZBT# for the snoop write back 
cycle and sampling of MOSTB for that cycle. 
MFRZ# is also sampled at this time. _ 


MOSTB is toggled in clocks 12 to 14 to cause the 
memory burst counter to be incremented, and data 


The = memory — bus 
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to be read from the 82490XP cache memory cycle 


buffers. Note: MOSTB latches the memory bus data 
on both the rising and falling edges. 


The MBC drives MEOC# asserted (clock 15) to end 
the snoop write back cycle on the memory bus and 
switch memory cycle buffers for the new cycle. 
MZBT# and MFRZ# for the next cycle, are sam- 
pled at this time on the falling edge of MEOC#. 


MDOE # is deasserted by the MBC (clock 16) to al- 


low the memory data pins to be used as inputs for 
the reissued read miss cycle. 


8.3.4 CACHE TO CACHE TRANSFER 


_ 8.3.4.1 Read Cycles Causing a Snoop Hit 


to [M] Line 


Mn AM 


Figure &.10 iiiustrates CPU initiated Read cycles that 
miss the 82495XP/82490XP cache and replace a 
non-dirty (eg. clean) line in the cache. During the 
snoop window, the memory bus attribute which indi- 
cates a direct to [M] state transfer is sampled active. 


-In such cycles, the 82495xXP will instruct the MBC to 


perform a cache line-fill cycle on the memory bus. 
The request for data will not go to main memory, but 
instead will go to the controller of the cache which 
contained the modified data. The line is then written 
into the 82490XP’s array, and data transferred to the 
CPU as requested. If the line fetched from the sec- 
ond cache replaces a line which is in valid unmodi- 


fied state ([E] or [S]), then a back-invalidation cycle 


is performed on the CPU bus to guarantee that the 
replaced data is also removed from the CPU’s first 
level cache, thus maintaining the inclusion pope 


CACHE CONTROL SIGNALS: 


The CPU ‘initiates the read cycle to. the 
82495XP/82490XP cache where the cache tag 
state is looked up. Once the 82495XP determines 
the cycle to be a. cache miss, it issues CADS# 
(clock 2) and the associated cycle control signals to 
the MBC (eg. CW/R#, CM/IO#, CD/C#, RDYSRC, 
MCACHE #) in order to schedule the cache line-fill 
operation. MCACHE # is active, indicating that the 


‘read miss is potentially cacheable by the 82495xP; 


RDYSRC is active, indicating that the MBC must 
supply BRDY #s to the CPU cache core. | 


address (MSET[10:0], 
MTAG[11:0], MCFAI[6:0]) is valid with CADS# 
(clocks 2 and 13 for the two read miss cycles in this 
example) and remain valid until after CNA# is sam- 
pled active by the 82495xXP (clocks 5 and 16). MALE 
and iMBALE may be used to hold the address as 
necessaly: 
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Figure 8-10. Cache to Cache Transfer: Cacheable Read Miss 
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The MBC arbitrates for the memory bus and returns 


BGT# asserted (clock 3), indicating that the cycle is — 


guaranteed to complete on the memory bus. Once 
the 82495XP samples BGT # asserted, it must finish 
that cycle on the memory bus. Prior to this point, the 
cycle can be aborted by a snoop hit rom another 
cache. 


CNA# ig asserted by the MBC (clock i to indicate 
that it is ready to schedule a new memory bus cycle. 


Note that after CNA# activation, cycle control sig- | 


nals are not guaranteed to be valid. 


When the MBC has determined the saehaabilivea at- | 
tribute of the cycle, it drives the MKEN# signal ac- 
cordingly. The MBC also drives the KWEND# signal . 


at this time, indicating the end of the cacheability 
window. The 82495XP samples .MKEN# and 


MRO # during KWEND # (clock 5) to determine that — 


the cycle is indeed cacheable. 


The MBC asserts SWEND# when the snoop win- 
dow ends on the memory bus. The 82495XP sam- 


' ples MWB/WT# and DRCTM# during SWEND# © 


(clock 7) and updates the cache tag state according 


to the consistency protocol. Since the result of the — 


snoop was a hit to a modified line in another cache, 
the MBC asserts DRCTM# at this time (this is an 
option to save time by skipping the main memory 
access, not a requirement of the memory bus) so 
that the tag state will go immediately to the [M] 
‘state, skipping the [E] state. MWB/WT# must be in 
write back mode (high) to assure this transition. The 
closure of the snoop window also enables the MBC 


to start providing the CPU with data that has been . 


stored in the 82490XP’s memory cycle buffer. The 
MBC supplies BRDY #s to the CPU (clocks 7-10). 


The 82495XP issues a new CADS# jin clock 13, 
which also misses the 82495XP/82490XP cache. 
Since the 82495XP has already sampled CNA# as- 
serted (clock 4), It issues a new CADS# prior to 
receiving CRDY # of the current cycle (ie. this cycle 
is pipelined within the MBC). Note that once the cy- 
cle progress signals (BGT#, CNA#, KWEND#, 
SWEND#) of a cycle are sampled asserted, the 
82495xXP ignores them until the CRDY # of that cy- 
cle. The 82495XP does not pipeline the cycle prog- 
‘ress signals (ie. it will not sample them again until 
after CRDY # of the current memory bus cycle). 


MEMORY BUS CYCLES: . 


At the start of this cycle, the master 82495XP does 
not know that the data will be coming from a slave 
82495XP/82490XP and begins a read request to 
main memory to obtain the required data. Since the 
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snoop resulted in a hit to a modified line in the sec- 
ond cache, the memory request must be backed off 
so that the snooped 82495XP may supply the data. 


The memory address latch enables (MALE and 


_ MBALE) may remain asserted by the MBC to place 


the address latches in flow through mode. If the 


. 82495xXP is the current bus master, the memory ad- 


dress output enables (MAOE# and MBAOE#) 
should be asserted by the MBC. The memory data 


- output enable signal (MDOE #) must remain inactive 
to allow the data pins to be used as inputs. 


For Clocked Memory Bus Mode, MSEL # is driven 
active by the MBC (clock 4) to allow sampling of 
MBRDY# and to latch MZBT# for the transfer. 


-MZBT# is sampled on all MCLK edges where 


MSEL ¥ is inactive. Once MSEL# is sampled active 


_ by the 82495xXP, the value of MZBT# sampled on 
the prior MCLK is used for the next Hanstel; 


MBRDY # is driven active in locks 4 io 10 to read 
data into the 82490XP cache memory cycle buffers. 


~The MBC asserts MEOC# (clock 11) to end the 


read miss cycle on the memory bus and switch the 
memory cycle buffers for a new cycle. MZBT# is 
latched at this time for the next transfer.. Note that 
there are 8 transfers needed to fill the 
82495XP/82490XP cache line and only 4 needed 
for the CPU line fill. 


-MBRDY# is again driven active by the MBC in 


clocks 11 to 21 to cause the memory burst counter 
to be incremented and data to be placed into the 
82490XP cache memory cycle buffers for the sec- 
ond read miss cycle. 


For Strobed Memory Bus Mode, MSEL# is driven 
active by the MBC (clock. 4) to allow MISTB opera- 
tion and to latch MZBT# for the transfer (on the 
falling edge of MSEL#). MISTB is toggled in clocks 
5 to 11 to cause the memory burst counter to be 


_ incremented, and data to be placed into the 


82490XP cache memory cycle buffers. Note: MISTB 
latches the memory bus data on both the rising and 


falling edges. The MBC drives MEOC# asserted 


(clock 12) to end the current cycle on the memory 
bus and switch memory cycle buffers for the new 
cycle. MZBT# for the next cycle is latched at this 
time on the falling edge of MEOC#. 


The MBC toggles MISTB (clocks 16 to 21) for the 


second read miss cycle to increment the memory 
burst counter and cause data to be written into the 
82490XP memory cycle buffers. 
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Figure 8-11. Read For Ownership 
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8.4 Read for Ownership 


8.4.1 WRITE MISS WITH MFRZ# ASSERTED, 
FOLLOWED BY READ TO SAME LINE 


Figure 8-11 illustrates a Read For Ownership cycle. 
First, a CPU initiates a write cycle which misses the 
82495XP/ 82490XP cache. The MBC issues a ‘‘dum- 
my” write to main memory (the write does not actu- 
ally go out to main memory - to save valuable bus 
time). The 82490XP MFRZ# input is used by the 
‘MBC to indicate that the following line-fill (allocation) 
data (from either main memory or another cache) 
should be merged with the data of the write miss. 
The entire line is then placed into the internal ta- 
gram. 


CACHE CONTROL SIGNALS: 


The CPU initiates a write cycle to the 
82495XP/82490XP cache where the cache tag 
state is looked up. Once the 82495XP determines 
the cycle to be a cache miss, it issues CADS# 
(clock 1) and the associated cycle control signals to 
the MBC (eg. CW/R#, CM/IO#, CD/C#, RDYSRC, 
MCACHE #, PALLC #) in order to schedule the write 
operation. MCACHE # is not active; RDYSRC is not 
active, indicating that the 82495XP will supply 
BRDY #s to the CPU; PALLC # is active, indicating a 


potential allocate cycle after the write through cycle. 


The write miss data is posted in the 82490XP’s 
memory cycle buffer, and the cycle completes with 
no wait states to the CPU. The CPU is free to issue 
another (non-related) cycle while the 82495XP is 


processing the allocation. If this new cycle is a - 


cache hit, it will be serviced by the 82495XP immedi- 
ately; but if it is a cache miss, its service will wait 
until the CRDY # of the allocation. 


The memory bus7~ address (MSET[10:0], 
MTAG[11:0], MCFAI[6:0]) is valid with CADS# 
(clocks 1 and 5 for the write miss and allocation cy- 
cle in this example) and remain valid until after 
CNA# is sampled active by the 82495xXP (clocks 4 
and 10). MALE and MBALE may be used to hold the 
address as necessary. 


The MBC arbitrates for the memory bus and returns 
BGT# asserted (clock 2), indicating that the write 
through cycle is guaranteed to complete on the 
memory bus. Once the 82495XP samples BGT # as- 
serted, it must finish that cycle on the memory bus. 
Prior to this point, the cycle can be aborted by a 
snoop hit from another cache. | 


CNA# is asserted by the MBC (clock 3) to indicate 
that it is ready to schedule a new memory bus cycle. 
Note that after CNA# activation, cycle control sig- 
nals are not guaranteed to be valid. 
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When the MBC has determined the cacheability at- 
tribute of the write through cycle, it drives the 
MKEN# signal accordingly. The MBC also drives 
the KWEND# signal at this time, indicating the end 
of the cacheability window. The 82495XP samples 
MKEN# active during KWEND# (clock 3), indicat- 
ing that the missed line should be allocated in the. 


~ cache. 


The MBC asserts SWEND# (clock 5) when the 
snoop window of the write through cycle ends on the 


memory bus. Note that the direct to [M] state qualifi- 


er signal (DRCTM#) is sampled during SWEND# 
and is inactive for the write . The MBC also issued | 
CRDY# to the 82495xXP at this time so that the 
82495xXP thinks the write cycle completed on the 
memory bus when, in fact, it did not. 


In this example, the 82495XP requests the allocation 
cycle by issuing CADS# in clock 5. The cycle con- 
iroi signais are vaiid at this point: MCACHE # is ac- 
tive, indicating the cacheability of the line-fill cycle; 
RDYSRC is not active, indicating that the MBC need 
not supply BRDY #s to the CPU (no BRDY#s are 
necessary for an allocation cycle). 


Once again, the MBC arbitrates for the memory bus 
and returns BGT # asserted (clock 6) for the alloca- 
tion cycle. The MBC asserts CNA#, KWEND#, and | 
SWEND# (clock 9) to pipeline the memory bus and 
close the cacheability and snoop windows. Note that 
(for this example) DRCTM# is asserted during 
SWEND#.to place the line in the modified state. 
Since this is done, all other caches must invalidate 
their copies. 


CRDY # for the allocation (line-fill) cycle is issued by 
the MBC in clock 11 to complete the read cycle on 
the memory bus and place the data into the 
82490XP cache array. 


MEMORY BUS SIGNALS: 


The memory address latch enables (MALE and 
MBALE) may remain asserted by the MBC to place 
the address latches in the flow through mode. If the 
82495XP is the current bus master, the memory ad- 
dress output enables (MAOE# and MERE E) 


should be asserted by the MBC. 


For Clocked Memory Bus Mode, the memory data 
output enable (MDOE#) has been asserted by the 
MBC to drive the memory data outputs. 


The MBC asserts MSEL# (clock 2) to allow sam- 
pling of MBRDY # and to latch MZBT # and MFRZ# 
for the write. MBRDY# and MEOC# are asserted 
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by the MBC (clock 3) to place the write data into the 
memory cycle buffers, sample MZBT # and MFRZ# 
for the next transfer, and end the current cycle on 
the memory bus. MFRZ# is driven active by the 
MBC here, indicating to the 82495XP that the data 
of the write through will be merged with the following 
allocation data. 


For the allocation (line fill) cycle, MSEL# is driven 
active again by the MBC (clock 6) to allow sampling 
of MBRDY# and to latch MZBT# for the transfer. 
MZBT# is sampled on all MCLK edges where 
MSEL # is inactive. Once MSEL# is sampled active 
by the 82495xXP, the value of MZBT# sampled on 
the prior MCLK is used for the next transfer. 
‘'MDOE# is also deasserted in clock 6 to allow the 
data pins to be used as inputs for the allocation cy- 
cle. 


MBRDY # is driven active by the MBC in clocks 7 to 
9 to cause the memory burst counter to be incre- 
mented and data to be placed into the 82490XP 
cache memory cycle buffers. During the line fill, the 
82490XP will merge the data from the. write through 
buffer with the incoming data from either main mem- 
ory or another cache (if that line was a write hit to 
[M] in another cache). 


The MBC drives MEOC# asserted (clock 10) to end 
the allocation cycle on the memory bus and switch 
memory cycle buffers for the new cycle. MZBT # is 
sampled at this time for the next data transfer. 


For Strobed Memory Bus Mode, the memory data 
output enable (MDOE#) has been asserted by the 
MBC to drive the memory data outputs. 


The MBC asserts MSEL# (clock 2) to allow toggling 
of MISTB and to latch MZBT # and MFRZ# for the 
write (on MSEL# falling edge). MISTB is toggled 
and MEOC# asserted by the MBC (clock 2) to place 
the write data into the memory cycle buffers, sample 
MZBT# and MFRZ# for the next transfer (on the 
falling edge of MEOC# while MSEL # is active), and 
end the current cycle on the memory bus. MFRZ # is 
driven active by the MBC here, indicating to the 
. 82495XP that the data of the write through will be 
merged with the following allocation data. 


For the allocation (line fill) cycle, MSEL# is driven 
active again by the MBC (clock7) to allow sampling 
of MOSTB and to latch MZBT# for the transfer. 
MDOE# is also deasserted in clock 7 to allow the 
data pins to be used as inputs for the allocation Cy- 
cle. 


MOSTB is toggled by the MBC in clocks 8 to 10 to 
cause the memory burst counter to be incremented 
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and data to be placed into the 82490XP cache mem- 
ory cycle buffers. During the line fill, the 82490XP 
will merge the data from the write through buffer with 
the incoming data from either main memory or an- 
other cache (if that line was a write hit to [M] in 
another cache). 


The MBC drives MEOC # asserted (clock 11) to end 
the allocation cycle on the memory bus and switch 
memory cycle buffers for the new cycle. MZBT # is 
sampled at this time for the next data transfer. 


8.5 [/O Cycles 


Figure 8-12 illustrates CPU initiated I/O cycles, both 
read and write. |/O writes are the only write cycles 
not posted by the 82495XP/82490XP cache (ie. the 
cycle is not fully acknowledged to the CPU until it 
has completed on the memory bus). 


CACHE CONTROL SIGNALS: 


The CPU initiates an I/O write cycle to the 
82495XP/82490XP. The 82495XP then issues 
CADS# and CDTS# (clock 1) and the associated 
cycle control signals to the MBC (eg. CW/R#, CM/ 


—10#, CD/C#, RDYSRC, MCACHE #). MCACHE # in 


not active, indicating that the cycle is not cacheable; 
RDYSRC is active, indicating that the MBC must 


supply BRDY #s to the CPU/Cache core. 


The memory bus~ address (MSET[10:0], 
MTAG[11:0], MCFA[6:0]) is valid with CADS# 
(clocks 1 and 10 for the two read s in this example) 
and remain valid until after CNA# is sampled active 
by the 82495XP (clocks 6 and 17). MALE and MBA- 
LE may be used to hold the address as necessary. 


The MBC arbitrates for the memory bus and returns 
BGT# asserted (clock 2) for the |/O write cycle, in- 
dicating that the cycle is guaranteed to complete on 
the memory bus. Once the 82495XP samples BGT # 
asserted, it must finish that cycle on the memory 
bus. Prior to this point, the cycle can be aborted bya 
snoop hit from another cache. 


CNA# for the write cycle is asserted by the MBC 
(clock 5) to indicate that it is ready to schedule a 
new memory bus cycle. Note that SWEND# and 
KWEND# are not needed for !/O cycles since mney 
are not cacheable. 


The MBC asserts BRDY # in clock 7 to complete the 
1/O write cycle from the CPU, and CRDY # in clock 8 
to complete the cycle on the memory bus from the 
82495XP/82490XP cache. 
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Figure 8-12. 1/O Write and Read Cycles 
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A new CADS # is issued from the 82495xXP in clock 
10 for an I/O read cycle, along with the associated 
cycle control signals. MCACHE # is again not active, 
and RDYSRC is again active. 


The MBC returns BGT # asserted right away (clock 
11). The 82495XP can pipeline I/O cycles, but does 
not for the I/O read in this example. 


Upon completing the access on the memory bus, 
the MBC activates BRDY# (clock 17) and CRDY # 
(clock 16). Note that BRDY# of a cycle may come 
before (as in the I/O write cycle of this example), 
with or after the CRDY # of the same cycle. 


MEMORY BUS SIGNALS: 


The memory address latch enables (MALE and 
MBALE) may remain asserted by the MBC to place 
the address latches in flow through mode. If the 
82495xP is the current bus master, the memory ad- 
dress output enables (MAOE# and MBAOE#) 
should be asserted by the MBC. 


For Clocked Memory Bus Mode, The memory data 
output enable signal (MDOE#) is asserted by the 
MBC in clock 3 to drive the memory data outputs. 


MEOC# is asserted by the MBC (clock 5) to latch 
MZBT # for the I/O write transfer, and end that cycle 


on the memory bus (MBRDY# is not necessary | 


since this example shows a single transfer cycle). 
MZBT # is driven high by the MBC in order to force 
the write cycle to begin with the correct burst ad- 
dress. MFRZ# is also sampled here (it need not be 
active since the cycle is not potentially allocatable). 


For the I/O read cycle, MDOE # is deasserted (clock 
12) by the MBC to allow the data pins to be used as 
inputs. 


MSEL# is driven active by the MBC (clock 12) to 
allow sampling of MBRDY # and to latch MZBT # for 
the transfer. MZBT # is sampled on all MCLK edges 
where MSEL# is inactive. Once MSEL# is sampled 
active by the 82495xP, the value of MZBT# sam- 
pled on the prior MCLK is used for the next transfer. 
Again, MZBT# is driven high by the MBC to force 
the transfer to begin with the correct burst address. 


The MBC asserts MBRDY # (clock 14) to cause the 
memory burst counter to be incremented and data to 
be placed into the 82490XP cache memory cycle 
buffers. The MBC drives MEOC# asserted (clock 
15) to end the read cycle on the memory bus and 
switch memory cycle buffers for a new cycle. 
MZBT# for the next transfer is latched at this time. 
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For Strobed Memory Bus Mode, The memory data 
output enable signal (MDOE#) has been asserted 
by the MBC to drive the memory data outputs. 


MEOC# is asserted by the MBC (clock 5) to latch 
MZBT # for the I/O write transfer (on MEOC # falling 
edge), and end that cycle on the memory bus 
(MOSTB is not necessary since this example shows 
a single transfer cycle). MZBT # is driven high by the | 
MBC in order to force the write cycle to begin with 
the correct burst address. MFRZ# is also sampled 
here (it need not be active since the cycle is not 
potentially allocatable). 


For the I/O read cycle, MDOE # is deasserted (clock 
10) by the MBC to allow the data pins to be used as 
inputs. 


MSEL# is driven active by the MBC (clock 10) to 
allow operation of MISTB and to latch MZBT# for 
the transfer (on MSEL# falling edge). Again, 
MZBT # is driven high by the MBC to force the trans- 
fer to begin with the correct burst address. 


The MBC toggles MISTB (clock 15) to cause the 
memory burst counter to be incremented and data to 
be placed into the 82490XP cache memory cycle 
buffers for the I/O read cycle. Note: MISTB latches 
the memory bus data on both the rising and falling 
edges. The MBC drives MEOC # asserted (clock 16), 
to end the read cycle on the memory bus and switch 
memory cycle buffers for a new cycle. MZBT# for 
the next transfer is latched at this time (on the falling 
edge of MEOC #). 


8.6 LOCKed Cycles 


8.6.1 CPU READ MODIFY WRITE CYCLES 


The 82495xXP provides a facility to allow atomic ac- 
cesses requested by the CPU (via CPU LOCK # acti- 
vation) through the 82495XP KLOCK# signal. Fig- 
ure 8-13 illustrates two back-to-back CPU initiated 


Locked read-modify-write cycles. KLOCK# activa- 


tion indicates to the MBC that the memory bus 
should not be released between the KLOCKed cy- 
cles. KLOCK# will remain asserted from the begin- 
ning of the first cycle (with CADS#) until one clock 


- after the CADS of the last cycle. The 82495XP does 


not distinguish between back-to-back locked opera- 
tions and will not open an arbitration window (deas- 
sert KLOCK #) between them. It is the responsibility 


of the MBC to distinguish between the multiple RMW _ 


sequences, if it is so desired. 
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The 82495XP issues a request for a memory bus 
access:(CADS#) for every locked cycle (read or 
write) regardless if it hits the cache tag state or not. 
Locked read cycles are treated by the 82495XP as 
cache misses, and, if the line is in the [M] state, the 
82495xXP ignores the data on the memory bus and 
uses the data in the 82490XP array. Locked write 
cycles are treated as write through, and the tag state 
does not change even if the line is in the 82490XP 
array. 


CACHE CONTROL SIGNALS: 


The CPU initiates a Locked read cycle to the 
82495XP/82490XP cache where, due to the asser- 
tion of CPU LOCK#, it assumes a cache miss and 
issues CADS# to the MBC (clock 1) along with the 
associated cycle control signals (eg. CW/R#, CM/ 
lO#, CD/C#, RDYSRC, MCACHE #). MCACHE # is 
never asserted for LOCKed cycles; RDYSRC is ac- 
tive, indicating that the MBC must supply BRDY # to 
the CPU/Cache core. 


The memory bus_ address 
MTAG[11:0], MCFA[6:0]) is valid with CADS# 
(clocks 1 and 5, then 7 and 11 for the two locked 
RMW sequences in this example) and remain valid 
until after CNA# is sampled active by the 82495XP 
(clocks 3 and 7, then 9 and 13). MALE and MBALE 
may be used to hold the address as necessary. 


The MBC arbitrates for the memory bus and returns 
BGT # asserted (clock 2), indicating that the cycle is 
guaranteed to complete on the memory bus. Once 
the 82495XP samples BGT # asserted, it must finish 
that cycle on the memory bus. Prior to this point, the 
cycle can be aborted by a snoop hit from another 
cache. 


CNA# for the read cycle is also asserted by the 
MBC (clock 2) to indicate that it may schedule a new 
memory bus cycle. Note that the cycle control sig- 
nals are not guaranteed to be valid after CNA# acti- 
vation. 


The MBC asserts BRDY# to the CPU/Cache core 
in clock 4. CRDY # for the locked read cycle is as- 
serted to the 82495XP/82490XP from the MBC 
(clock 5) to load the data stored in the 82490XP’s 
memory cycle buffers into the cache array. If the 
read was to a dirty line, the 82495XP is intelligent 
enough to ignore the data in the memory cycle buff- 
ers and use the data in the cache array. 


Locked sequences always end in a write cycle, no 
new CPU initiated cycles may be inserted between 
the Locked read and Locked write cycles. Therefore, 


(MSET[10:0], — 
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the 82495XP issues a new memory cycle request 


(CADS # in clock 5) for the Locked write as soon as 
it completes the Locked read cycle. The cycle con- 
trol signals are also valid at this time. RDYSRC is not 
active, indicating that the 82495XP will supply 
BRDY # to the CPU. 


The locked write cycle is posted like any other mem- 
ory write cycle. 


In this example, the CPU initiates a second read- 
modify-write cycle immediately. KLOCK# is not 
deasserted between the back-to-back locked se- 
quences since the CPU LOCK # remains asserted. If 
snooping is required between these cycles, it is the 
MBC responsibility to predict this boundary and al- 
low snooping. The 82495XP issues a memory bus 
request (CADS#) in clock 7 for the second locked 
read cycle, along with the new cycle control signals. 


The second locked RMW sequence repeats the ac- 
tions of the first. It’s purpose in this example is to 
demonstrate that an arbitration window may not 
open between locked sequences if they follow one 
another with no idle or non-locked cycles between 
them. 


MEMORY BUS SIGNALS: 


The memory address latch enables (MALE and 
MBALE) may remain asserted by the MBC to place 
the address latches in flow through mode. If the 
82495xP is the current bus master, the memory ad- 
dress output enables (MAOE# and MEO 
should be asserted by the MBC. 


For Clocked Memory Bus Mode, MSEL# is driven 
active by the MBC (clock 3) to allow sampling of 
MBRDY# and to latch MZBT# for the transfer. 
MZBT# is sampled on all MCLK edges where 
MSEL # is inactive. Once MSEL# is sampled active 
by the 82495xP, the value of MZBT# sampled on 
the prior MCLK is used for the next transfer. 


The memory data output enable signal (MDOE#) 
must be inactive to allow the data pins to be used as 
inputs for the first locked read cycle. The MBC as- 
serts MEOC# (clock 4) to latch MZBT # for the next 
transfer, and end the current locked read cycle on 
the memory bus (MBRDY # is not necessary since 
this example shows a single transfer cycle). MZBT # 
is driven high by the MBC in order to force the read 
cycle to begin with the correct burst address. 


For the locked write cycle, MDOE # is asserted by 
the MBC (clock 5) to drive the memory data outputs. 
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MEOC # is again asserted (clock 6) to latch MZBT # 


for the next transfer, and end the current locked 
write cycle on the memory bus (MBRDY#¥ is not 
necessary since this is a single transfer cycle). 
MZBT # is again driven high. MFRZ# is also sam- 
pled during write cycles when MEOC# is sampled 
active by the 82495xP. 


MDOE # is idassarted by the MBC (clock 7) to allow 
the data pins to be used as inputs for the second 
locked read cycle. MEOC # is again asserted (clock 


8) to latch MZBT # for the next transfer, and end the | 


locked read cycle on the memory bus. MZBT# is 
again driven igh 


MDOE# is asserted dy the MBC (clock 9) to drive 
_ the memory data outputs for the second locked write 

_ cycle. MBRDY # is asserted (clock 13) to cause the 
memory burst counter to be incremented and data to 
be placed into the 82490XP cache memory cycle 
buffers. The MBC drives MEOC# active and 
MSEL# inactive (clock 14) to end the locked write 
cycle on the memory bus and switch memory cycle 
buffers for a new cycle. MZBT # and MFRZ# for the 
next transfer are sampled at this time. 


For Strobed Memory Bus Mode, MSEL# is driven — 


active by the MBC (clock 1) to allow sampling of 
MxSTB and to latch MZBT # for the first locked read 
transfer (on the falling edge of MSEL#). 


The memory data output enable signal (MDOE#) 
must be inactive to allow the data pins to be used as 
inputs for the first locked read cycle. The MBC as- 
serts MEOC # (clock 3) to latch MZBT # for the next 
transfer (on MEOC# falling edge while MSEL# is 
active), and end the current locked read cycle on the 
memory bus (MISTB is not necessary since this ex- 
ample shows a single transfer cycle). MZBT# is 
driven high by the MBC in order to force the read 
cycle to begin with the correct burst address. 


For the locked write cycle, MDOE# is asserted by 
the MBC (clock 4) to drive the memory data outputs. 
MEOC # is again asserted (clock 6) to latch MZBT # 
for the next transfer, and end the current locked 
write cycle on the memory. bus (MOSTB is not nec- 
essary since this is a single transfer cycle). MZBT # 
is again driven high. MFRZ# is.also sampled on Ne 
falling edge of MEOC#. 


MDOE # is deasserted by the MBC (clock 7) to allow 
the data pins to be used as inputs for the second 
locked read cycle. MEOC# is again asserted (clock 
8) to latch MZBT # for the next transfer, and end the 
locked read cycle on the memory bus. MZBT# is 
again driven high. 
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MDOE# is asserted by the MBC (clock 9) to drive 
the memory data outputs for the second locked write 
cycle. MOSTB is toggled (clock 12) to cause the 
memory burst counter to be incremented and data to 
be placed into the 82490XP cache memory cycle 
buffers. The MBC drives MEOC# = active and 
MSEL # inactive (clock 13) to end the locked write 
cycle on the memory bus and switch memory cycle 
buffers for anew cycle. MZBT # and MFRZ# for the 
next transfer are sampled at this time. 


9.0 TESTABILITY 


Testing the 82495XP/82490XP chipset can be divid- 
ed into three categories: Built-In Self Test (BIST), — 
Boundary Scan, and external testing. BIST performs 
basic device testing on the 82495XP. Boundary 
Scan provides additional test hooks that conform to 
the IEEE Standard Test Access Port and Boundary 
Scan Architecture (IEEE Std.1149.1). Additional. 
testing can be performed by using software written 
to test the 82490XP cache SRAM. 


9.1 Built-In Self Test (BIST) 


BIST tests the internal funcitonality of the 82495XP. 
The 82495xXP’s BIST tests approximately 90% of 


‘the cache controller. It tests the tag RAM and com- 


parators. 


The 82495XP BIST is_ initiated by driving 
SLFTST #(CRDY #) low and HIGHZ #(MBALE) high 
at least 10 clocks before RESET goes inactive. The 
82495XP Cache Controller reports the result of BIST 
on the CAHOLD signal. When the self test com- 
pletes, the 82495xXP drives FSIOUT# inactive and 
the BIST result on CAHOLD. If CAHOLD is driven 
active the BIST successfully passed. If CAHOLD is 
driven inactive, BIST detected a flaw in the cache 
controller. CAHOLD is valid for one clock after 
FSIOUT# deactivation and should be sampled on 
the rising edge of FSIOUT#. | 


On the 82495xXP, BIST only informs the system that 
a failure did or did not occur. BIST is not able to 
indicate where a failure occurred. After completing 
BIST the cache controller perorm reset and pein 
normal ae | 


9.2 Boundary Scan 


The 82495XP/82490XP chipset provides additional 
test ability features compatible with the IEEE Stan- 
dard Test Access Port and Boundary Scan Architec- 
ture (IEEE Sid.ii49.1). The test logic provided al- 
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lows for testing to insure that components function 
correctly, that. interconnections between various 
components are correct, and that various compo- 
nents interact correctly on the printed circuit board. 


The boundary scan test logic consists of a boundary 
scan register and support logic that are accessed 
through a test access port (TAP). The TAP provides 
a simple serial interface that makes it possible to 
test all signal traces with only a few probes. 


The TAP can be controlled via a bus master. The 
bus master can be either automatic test equipment 
or a component (PLD) that interfaces to the four-pin 
test bus. 


9.2.1 BOUNDARY SCAN ARCHITECTURE 


The boundary scan test logic contains the following 
elements: 


— Test access port (TAP), consisting of input pins 
TMS, TCK, and TDI; and ouput pin TDO. 


— TAP controller, which interprets the inputs on the 
test mode select (TMS) line and performs the 
corresponding operation. The operations’ per- 
formed by the TAP include controlling the in- 
struction and data registers within the compo- 
nent. 


— Instruction register (IR), which accepts instruc- 

~’ tion codes shifted into the test logic on the test 
data input (TDI) pin. The instruction codes are 
used to select the specific test operation to be 
performed or the test data register to be ac- 
cessed. 


— Test data registers: The 82495XP/82490XP 
chipset components each contain three test data 
registers: Bypass register (BPR), Device Identifi- 
cation register (DID), and Boundary Scan regis- 
ter (BSR). 


The instruction and test data registers are separate 
shift-register paths connected in parallel and have a 
common serial data input and a common serial data 
output connected to the TAP signals, TDI and TDO, 
respectively. | 


9.2.2 DATA REGISTERS 


The 82495XP and 82490XP both contain the two 
required test data registers; bypass register and 
boundary scan register. In addition, they also have a 
device identification register. 
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Each test data register is serially connected to TDI 
and TDO, with TDI connected to the most significant 
bit and TDO connected to the least significant bit of 
the test data register. Data is shifted one stage (bit 
position within the register) on each rising edge of 
the test clock (TCk). 


9.2.2.1 Bypass Register 


The Bypass Register is a one-bit shift register that 
provides the minimal length path between TDI and 
TDO. This path can be selected when no test opera- 
tion is being performed by the component to allow 
rapid movement of test data to and from other com- 
ponents on the board. While the bypass register is 
selected data is transferred from TDI to TDO without 
inversion. 


9.2.2.2 Boundary Scan Register 


The Boundary Scan Register is a single shift register 
path containing the boundary scan cells that are 
connected to all input and output pins of the 
82495XP/82490XP chipset. Figure 9.1 shows the 
logical structure of the boundary scan register. While 
output cells determine the value of the signal driven 
on the corresponding pin, input cells only capture 
data; they do not affect the normal operation of the 
device. Data is transferred without inversion from 
TDI to TDO through the boundary scan register dur- 
ing scanning. The boundary scan register can be op- 
erated by the EXTEST and SAMPLE instructions. 
The boundary scan register order is described in 
section 9.2.5. 


9.2.2.3 Device Identification Register 


The Device Identification Register contains the man- 
ufacturer’s identification code, part number code, 
and version code in the format shown in Figure 9.2. 
Table 9.1 lists the codes corresponding to the 
82495XP and 82490XP. : | 


Table 9-1. Device ID Register Values 


Version 
Code 


Component 
Code 


| Manufacturer 
Identity 


82495XP (BO) 


Bh 
82490XP 00h 
(AO or A1) 


, 


+ 
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Figure 9-1. Boundary Scan Register Structure 
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Figure 9-2. Device ID Register 


9.2.2.4 Runbist Register 


The Runbist Register is a one bit register used to 
report the results of the 82495XP BIST when it is 
initiated by the RUNBIST instruction. This register is 
loaded with a ‘1” prior to invoking the BIST and is 


loaded with ‘1’ upon successfull completion. “O’’ 


indicates a failure occurred during BIST. 


NOTE: 
82495XP RUNBIST is not available in the A-step- 


ping. 


9.2.3 INSTRUCTION REGISTER 


The Instruction Register (IR) allows instructions to 
be serially shifted into the device. The instruction 
selects the particular test to be performed, the test 
data register to be accessed, or both. The instruc- 
tion register is four (4) bits wide. The most significant 
bit is connected to TDI and the least significant bit is 
connected to TDO. There are no parity bits associat- 
ed with the Instruction register. Upon entering the 
Capture-IR TAP controller state, the Instruction reg- 
ister is loaded with the default instruction ‘‘0001”, 
SAMPLE/PRELOAD. Instructions are shifted into 
the instruction register on the rising edge of TCK 
while the TAP controller is in the Shift-IR state. 
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9.2.3.1 82495XP Boundary Scan Instruction Set 


The 82495XP cache controller supports all three 
mandatory boundary scan instructions (BYPASS, 
SAMPLE/PRELOAD, and EXTEST) along with one 
optional instruction (IDCODE). On the B-Stepping of 
the 82495XP two additional optional instructions will 
be implemented (RUNBIST and TRISTATE). Table 
9.3 lists the 82495XP boundary scan instruction 
codes. The instructions listed as PRIVATE cause 
TDO to become enabled in the Shift-DR state and 
cause ‘‘0” to be shifted out of TDO on the rising 
edge of TCK. Execution of the PRIVATE instructions 
will not cause hazardous operation of the 82495XP. 
Note that system tests should not execute instruc- 
tion codes labeled “RESERVED”. These instruc- 
tions can put the component in an undeterminant 
state which can only be cleared by power on reset. 


Table 9-2. 82495XP Boundary Scan 
Instruction Codes 


Instruction Code 
1100 PRIVATE. 


* RUNBIST and TRISTATE are boundary scan instructions 
that will be implemented in the B-stepping of the 82495XP. 
They are not available on the A-stepping. 


The instruction code is ‘0000’. The EX- 
TEST instruction allows testing of cir- 
cuitry external to the component pack- 
age, typically board interconnects. It 
does so by driving the values loaded 
into the 82495XP boundary scan regis- 
ter out on the output pins corresponding 
to each boundary scan cell and cap- 


EXTEST 
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SAMPLE/ 
PRELOAD 


IDCODE 


BYPASS 
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turing the values on 82495xXP input pins 
to be loaded into their corresponding 
boundary scan register locations. |/O 
pins are selected as input or output, de- 
pending on the value loaded into their 
control setting locations in the boundary 
scan register. Values shifted into input 
latches in the boundary scan register 
are never used by the internal logic of 
the 82495xXP. Note: after using the EX- 
TEST instruction, the 82495XP must be 
reset before normal (non-boundary 
scan) use. 


The instruction code is ‘‘0001”. The. 
SAMPLE/PRELOAD has two functions 
that it performs. When the TAP control- 
ler is in the Capture-DR state, the SAM- 
PLE/PRELOAD instruction allows a 
‘‘snap-shot” of the normal operation of 
the component without interfering with 
that normal operation. The instruction 
causes boundary scan register cells as- 
sociated with outputs to sample the val- 
ue being driven by the 82495XP. It caus- 
es the cells associated with inputs to 
sample the value being driven into the 
82495XP. On both outputs and inputs 
the sampling occurs on the rising edge 
of TCK. When the TAP controller is in 
the Update-DR state, the SAMPLE/ 
PRELOAD instruction preloads data to 
the device pins to be driven to the board 
by executing the EXTEST instruction. 
Data is preloaded to the pins from the 
boundary scan register on the falling 
edge of TCK. | 


The instruction code is 0010”. The ID- 
CODE instruction selects the device 
identification register to be connected to 
TDI and TDO, allowing the devices iden- 
tification code to be shifted out of the 
device on TDO. Note that the device 
identification register is not altered by 
data being shifted in on TDI. 


The instruction code is “1111”. The BY- 
PASS instruction selects the bypass 
register to be connected to TDI and 
TDO, effectively bypassing the test logic 
on the 82495XP by reducing the shift 
length of the device to one bit. Note that 
an open circuit fault in the board level 
test data path will cause the bypass reg- 
ister to be selected following an instruc- 
tion scan cycle due to the pull-up resis- 
tor on the TDI input. This has been done 
to prevent any unwanted interference 
with the proper operation of the system 
logic. es | 
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RUNBIST The instruction code is “0111”. The 
RUNBIST instruction selects the one (1) 
bit runbist register, loads a value of “0” 
into the runbist register, and connects it 
to TDO. It also initiates the built-in self 
test (BIST) feature of the 82495xXP, 
which is able to detect approximately 
90% of the stuck-at faults on the 

~ 82495XP. The 82495XP ac/dc specifi- 
cations for VCC and CLK must be met 
and reset must have been asserted at 
least once prior to executing the 
RUNBIST boundary scan instruction. Af- 
ter loading the RUNBIST instruction 
code in the instruction register, the TAP 
controller must be placed in the Run- 
Test/Idle state. BIST begins on the first 
rising edge of TCK after entering the 
Run-Test/Idle state. The TAP controller 
must remain in the Run-Test/Idle state 
until BIST is completed. It requires 100K 
clock (CLK) cycles to complete BIST 
and report the result to the runbist regis- 
ter. After completing the 100K clock 
(CLK) cycles, the value in the runbist 
register should be shifted out on TDO 


during the Shift-DR state. A value of “1” | 


being shifted out on TDO indicates BIST 
successfully completed. A value of ‘‘0” 
indicates a failure occurred. After exe- 
cuting the RUNBIST instruction, the 
82495XP must be reset prior to normal 
operation. NOTE: This instruction is not 
available on the A-stepping of the 
82495xP. It will be implemented in the 
B-stepping. 

TRISTATE The instruction code is “1000”. The 
: TRISTATE instruction initiates the tri- 
state output test mode. After loading the 
TRISTATE boundary scan instruction 
into the instruction register, the TAP 
controller must be placed in the Run- 
Test/Idle state. To terminate the tristate 
output test mode, the 82495XP must be 
reset. NOTE: This instruction is not 
available on the A-stepping of the 
82495xXP. It will be implemented in the 
B-stepping. .* 3 


9.2.3.2 82490XP Boundary Scan Instruction Set 


The 82490XP cache controller supports all three 
mandatory boundary scan instructions (BYPASS, 
SAMPLE/PRELOAD, and EXTEST) along with one 
optional instruction (IDCODE). Table 9.4 lists the 
~ 82490XP boundary scan instruction codes. The in- 
structions listed as PRIVATE cause TDO to become 
_ enabled in the Shift-DR state and cause'‘‘0”’ to be 
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shifted out of TDO on the rising edge of TCK. Execu- | 
tion of the PRIVATE instructions will not cause haz- 
ardous operation of the 82490XP. Note that system 
tests should not execute instruction codes labeled 
“INTEL RESERVED”. These instructions can put 
the component in an undeterminant state which can 
only be cleared by power on reset.. | 


Table 9-3, 82490XP Boundary Scan 
Instruction Codes 


ee 24k es 


INTEL RESERVED © 


The instruction code is “0000”. The EX-. 
TEST instruction allows testing of cir- 
cuitry external to the component pack- 
age, typically board interconnects. It 
does so by driving the values loaded 
into the 82490XP boundary scan regis- 
ter out on the output pins corresponding | 
to each boundary scan cell and captur- 
ing the values on 82490XP input pins to 
be loaded into their corresponding 
‘boundary scan register locations. I/O 
pins are selected as input or output, de- 
pending on the value loaded into their 
, control setting locations in the boundary 
scan register. Values shifted into input 
latches in the boundary scan register 
are never used by the internal logic of 
the 82490XP. Note: after using the EX- 
TEST instruction, the 82490XP must be 
reset before normal (non-boundary 
scan) use. ‘¢ 


EXTEST 
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SAMPLE/ The instruction code is “0001”. The 


PRELOAD SAMPLE/PRELOAD has two functions — 


that it performs. When the TAP control- 
ler is in the Capture-DR state, the SAM- 
PLE/PRELOAD instruction allows a 
“snap-shot” of the normal operation of 
the component without interfering with 
that normal operation. The instruction 
causes boundary scan register cells as- 
sociated with outputs to sample the val- 


-ue being driven by the 82490XP. It caus- | 


es the cells associated with inputs to 
sample the value being driven into the 
82490XP. On both outputs and inputs 


the sampling occurs on the rising edge » 


of TCK. When the TAP controller is in 
the Update-DR state, the SAMPLE/ 
PRELOAD instruction preloads data to 
the device pins to be driven to the board 
by executing the EXTEST instruction. 
Data is preloaded to the pins from the 
boundary scan register on the falling 
edge of TCK. 


IDCODE The instruction code is “0010”. The ID- 


CODE instruction selects the device 


identification register to be connected to 
TDI and TDO, allowing the devices iden- 
tification code to be shifted out of the 
device on TDO. Note that the device 
identification register is not altered by 
data being shifted in on TDI. 


The instruction code is “1111”. The BY- 
PASS instruction selects the bypass 
register to be connected to TDI and 
TDO, effectively bypassing the test logic 
on the 82490XP by reducing the shift 
length of the device to one bit. Note that 
an open circuit fault in the board level 
test data path will cause the bypass reg- 
ister to be selected following an instruc- 
tion scan cycle due to the pull-up resis- 
tor on the TDI input. This has been done 
to prevent any unwanted interference 


BYPASS 


with the proper operation of the system 


logic. 


9.2.4 TEST ACCESS PORT (TAP) 
CONTROLLER | 


The TAP controller is a synchronous, finite state ma- 
chine. It controls the sequence of operations of the 
test logic. The TAP controller changes state sib a in 
response to the following events: 

1. Arising edge of TCK 


2. Power-up. 
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. The value of the test mode state (TMS) input signal 


at a rising edge of TCK controls the sequence of the - 
state changes. The state diagram for the TAP con- 
toller is shown in figure 9.3. Test designers must | 
consider the operation of the state machine in order 
to design the correct sequence of values to drive on 
TMS. 


9.2.4.1 Test-Logic-Reset State | 


In this state, the test logic is disabled so that normal 
operation of the device can continue unhindered. 
This is achieved by initializing the instruction register 


such taht the IDCODE instruction is loaded. No mat- 


ter what the original state of the controller, the con- 
troller enters Test-Logic-Reset state when the TMS 
input is held high (1) for at least five rising edges of 
TCK. The controller remains in this state while TMS 
is high. The TAP controller is also forced to enter 
this state at power-up. | 


9.2.4.2 Run-Test/Idle State 


A controller state between scan operations. Once in 


this state, the controller remains in this state as 


long as TMS is held low. In devices supporting the 
RUNBIST instruction, the BIST is performed during 
this state and the result is reported in the runbist 
register. For instructions not causing functions to ex- 
ecute during this state, no activity occurs in the test . 
logic. The instruction register and all test data regis- 
ters retain their previous state. When TMS is high 
and a rising edge is applied to TCK, the controller 
moves to the Select-DR state. 


9.2.4.3 Select-DR-Scan State 


This is a temporary controller state. The test data 
register selected by the current instruction retains its 
previous state. If TMS is held low and a rising edge 
is applied to TCK when in this state, the controller 
moves into the Capture-DR state, and a scan se- 
quence for the selected test data register is initiated. 
If TMS is held high and a rising edge is applied to 
TCK, the controller moves to the Select-IR-Scan 
State. 


The instruction does not change in this state. 
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Test-Logic—Reset 
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(| Update-IR 
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Figure 9-3. Tap Controller State Diagram 


9.2. 4. 4 Capture-DR State _ 


In this ‘state, the boundary scan register captures 
input pin data if the current instruction is EXTEST or | 


SAMPLE/PRELOAD. The other test data registers, 
which do not have parallel input, are not changed. 


The instruction does not change in this state. 


When the TAP controller is in this state and a rising 
edge is applied to TCK, the controller enters the 
Exiti-DR state if TMS is high or the Shift-DR state if 
TMS is low. 


9. 2.4.5 Shift-DR State 


In this controller state; the test data register con- 


nected between TDI and TDO as a result of the cur- 


rent instruction, shifts data one stage toward its seri- 
al output on each rising edge of TCK. 


The instruction does not change in this state. 


When the TAP controller is in this state and a rising 
edge is applied to TCK, the controller enters the 
Exit1-DR state if TMS is high or remains in the Shift- 
DR state if TMS is low. 
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9.2.4.6 Exiti-DR State 


This is a temporary state. While in this state, if TMS 
is held high, a rising edge applied to TCK causes the 
controller to enter the Update-DR state, which termi- 
nates the scanning process. If TMS is held low anda 
rising edge is applied to TCK, the controller enters 
the Pause-DR state. 


The test data register selected by the current instru- 
citon retains its previous value during this state. The 
instruction does not change in this state. 


9.2.4.7 Pause-DR State 


The pause state allows the test controller to tempo- 
rarily halt the shifting of data through the test data 
register in the serial path between TDI and TDO. An 
example of using this state could be to allow a tester 
to reload its pin memory from disk during application 
of a long test sequence. 


The test data register selected by the current instru- 
citon retains its previous value during this state. The 
instruction does not change in this state. 


The controller remains in. this state as long as TMS 
is low. Whne TMS goes high and a rising edge is 
applied to TCK, the controller moves to the Exit2-DR 
state. | 


9.2.4.8 Exit2-DR State 


This is a temporary state. While in this state, if TMS 
is held high, a rising edge applied to TCK causes the 
controller to enter the Update-DR state, which termi- 
nates the scanning process. If TMS is held low and a 
rising edge is applied to TCK, the controller enters 
the Shift-DR state. 


The test data register selected by the current instru- 
citon retains its previous value during this state. The 
instruction does not change in this state. 


9.2.4.9 Update-DR State 


The boundary scan register is provided with a 
latched parallel output to prevent changes at the 
parallel output while data is shifted in response to 
the EXTEST and SAMPLE/PRELOAD instructions. 
When the TAP controller is in this state and the 
boundary scan register is selected, data is latched 
onto the parallel output of this register from the shift- 
register path on the falling edge of TCK. The data 
held at the latched parallel output does not change 
other than in this state. 
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All shift-register stages in test data register selected 
by the current instruciton retains its previous value 
during this state. The instruction does not change in 
this state. 


9.2.4.10 Select-IR-Scan State 


This is a temporary controller state. The test data 
register selected by the current instruction retains its 
previous state. If TMS is held low and a rising edge 
is applied to TCK when in this state, the controller 
moves into the Capture-IR state, and a scan se- 
quence for the instruction register is initiated. If TMS 
is held high and a rising edge is applied to TCK, the 
controller moves to the Test-Logic-Reset state. 


The instruction does not change in this state. 


9.2.4.11 Capture-IR State 


In this controller state the shift register contained in 
the instruction register loads the fixed valle “0001” 
on the rising edge of TCK. 


The test data acer selected by the current instru- 
citon retains its previous value during this state. The 
instruction does not change in this state. 


When the controller is in this state and a rising edge 
is applied to TCK, the controller enters the Exit1-IR 
state if TMS is held high, or the Shift-IR state if TMS 


is held low. 


9.2.4.12 Shift-IR State 


In this state the shift register contained in the in- 
struction register is connected between TDI and 
TDO.and shifts data one stage towards its serial out- 
put on each rising edge of TCK. 


The test data register selected by the current instru- 
citon retains its previous value during this state. The 
instruction does not change in this state. 


When the controller is in this state and a rising edge 
is applied to TCK, the controller enters the Exit1-IR 
state if TMS is held high, or remains in the Shift-IR 
state if TMS is held low. | 


9.2.4.13 Exiti-IR State | 


This is a temporary state. While in this state, if TMS 
is held high, a rising edge applied to TCK causes the 
controller to enter the Update-IR state, which termi- 
nates the scanning process. If TMS is held low and a 


rising edge is applied to TCK, the controller enters 


the Pause-IR state. 
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The test data register selected by the current instru- 
citon retains its previous value during this state. The 
instruction does not change in this state. 


9.2.4.14 Pause-IR State 


The pause state allows the test controller to tempo- 
rarily halt the shifting of data through the instruction 
register. 


The test data register selected by the current instru- 
citon retains its previous value during this state. The 
instruction does not change in this state. | 


The controller remains in this state as long as TMS 
is low. When TMS goes high and a rising edge is 
applied to TCK, the controller moves to the Exit2-IR 
state. 


9.2.4.15 Exit2- IR State 


This is a temporary state. While in this state, if TMS 
is held high, a rising edge applied to TCK causes the 
controller to enter the Update-IR state, which termi- 
nates the scanning process. If TMS is held low anda 
rising edge is applied to TCK, the controller enters 
the Shift-IR state. 


The test data register selected by the current instru- 


citon retains its previous value during this state. The | 


instruction does not change in this state. 


9.2.4.16 Update-IR State 


The instruction shifted into the instruction register is 
latched onto the parallel output from the shift-regis- 
ter path on the falling edge of TCK. Once the new 
instruction has been latched, it becomes the current 
instruction. 


Test data registers eeieciaa by the current instruc- 
tion retain the previous value. 


9.2.5 BOUNDARY SCAN REGISTER CELL 


The boundary scan register for each component 
contains a cell for each pin, as well as cells for con- 
trol of I/O and tristate pins. 


9.2.5.1 82495XP Boundary Scan Register Cell 


The following is the bit order of the 82495XP bound- 
ary scan register: (from left to right and top to bot- 
tom) 


- EADS# NA# 
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TDI— MKEN# KWEND# SWEND# BGT# 
CNA# BRDY# RESERVED CRDY# MWBWT# 
DRCTM# MRO# CWAY# FPFLD# SNPCYC# 
SNPBSY # MHITM# MTHIT# CAHOLD FSIOUT# - 
PALLC# SNPADS# CADS# CDTS# CWR# 
CDC# CMIO# RDYSRC MCACHE# KLOCK# 
SMLN# NENE# CFA3 CFA2 TAG11 TAG10 TAGS 
TAG8 TAG7 TAG6 TAGS TAG4 TAG3 TAG2 TAG1 
TAGO SET10 SET9 SET8 SET7 CLK SET6 SETS 
SET4 SET3 SET2 SET1 SETO CFA6 CFAS CFA4 
CFA1 CFAO ADS# LEN BLAST# BRDYC1# 
BRDYC2# CACHE# LOCK# BLE# BOFF# KEN# 
AHOLD WR# MIO# DC# PWT PCD HITM# PCYC 
INV WBWT# WAY WRARR# 
MCYC# BUS# MAWEA# WBWE# WBA WBTYP 
MCFAO MCFA1 MCFA4 MCFA5 MCFA6 MSETO 
MSET1 MSET2 MSET3 MSET4 MSET5 MSET6 
MSET7 MSET8 MSET9 MSET10 MTAGO MTAG1 
MTAG2 MTAG3 MTAG4 MTAGS MTAG6 MTAG7 
MTAG8 MTAG9 MTAG10 MTAG11 MCFA2 MCFA3 
RESET MAOE# MBAOE# SNPCLK SNPSTB# 
EWBE# MPIC# SNPINV FLUSH# SNYC# 
SNPNCA MBALE MALE MACTL OCTL CFA4CTL 
CFASCTL CACTL FPFLDCTL WBWTCTL 
NACTL—>TDO = —— | 


“RESERVED” signals correspond to no connect 


“NC” signals on the 82495XP. 


EWBE# and MPIC# will be implemented in the 


82495XP B-stepping, omit from boundary scan reg- 
ister for A-stepping 82495xXPs. 


All the *CTL cells are control cells that are used to 
select the direction of bidirectional pins or tristate 
output pins. If “1” is loaded into the control 
cell(*CTL), the associated pin(s) are tristated or. se- 
lected as input. The following lists the control cells 


and their corresponding pins. 


1. MACTL controls the MSETO-10, MTAGO- 11, 
and MCFAO-6 pins. 


2. OCTL controls the WAY, WRARR#, MCYC#, 

~ MAWEA4, BUS#, WBWE#, WBA, WBTYP, INV, 
EADS#, AHOLD, KEN#, BOFF#, BLE#, 
BRDYC2#, BRDYC1#, BLAST#, NENE#, 
SMLN#, KLOCK#, MCACHE#, RDYSRC, 
CMIO#, CDC#, CWR#, CDTS#, CADS#, 
SNPADS#, PALLC#, FSIOUT#, CAHOLD, 
MTHIT#, MHITM#, SNPBSY#, SNPCYC#, 
CWAY, EWBE#, and MPIC# eutpt pins. 


3. CFA4CTL controls the CFA4 pin. 
4. CFASCTL controls the CFA5 pin. 


5. CACTL controls the SETO-10, _TAGO- 11, 
CFAO-3, and CFA6 pins. 


— 6. FPFLDCTL controls the FPFLD# pin. 


7. WBWTCTL controls the WB/WT # pin. 
8. NACTL controls the NA# pin. 
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9.2.5.2 82490XP Boundary Scan Register Cell 


The following is the bit order of the 82490XP bound- 
ary scan register: (from left to right and top to bot- 
tom) 


TDI— CDCTL WR# BLAST# BRDYC# 
BRDY # HITM# ADS# BE# AO A1 A2 A3 A4 A5 A6 
A7 A8& AY A10 A11 A12 A13 A14 A15 MDATA7 
MDATA3 MDATA6 MDATA2 MDATAS MDATA1 
MDATA4 MDATAO MDCTL MDOE# MZBT# 
MBRDY # MOEC# MFRZ# MSEL# MCLK MOCLK 
RESET PAR# RESERVED BOFF# WBTYP WBA 
WBWE# BUS# MAWEA# MCYC# CRDY# 
WRARR# WAY CDATA4 CDATAO CDATA2 
CDATAS5 CDATA6 CDATAt1 CDATA3 
CDATA7 —> TDO 


“RESERVED” signals correspond to no connect 
“NC” signals on the 82490XP. 


All the *CTL cells are control ceils that are used to 
select the direction of bidirectional pins or tristate 
output pins. If “1” is loaded into the control 
cell(*CTL), the associated pin(s) are tristated or se- 
lected as input. The following lists the control cells 
and their corresponding pins. 


1. CDCTL controls the CDATAO-7 pins. 
2. MDCTL controls the MDATAO-7 pins. 


9.2.6 TAP CONTROLLER INITIALIZATION 


The TAP controller is automatically intialized when a 
device is powered up. In addition, the TAP controller 
can be initialized by applying a high signal level on 
the TMS input for five TCK periods. 


9.2.7 BOUNDARY SCAN SIGNAL DESCRIPTION 
AND TIMINGS 


The functionality of TDI, TMS, TDO, and TCK are 
described in Chapter 7. The A.C. timing specifica- 
tions for the boundary scan signals are located in 
Chapter 10. 


9.3 Tri-State Output Test Mode 


The 82495XP has the ability to tri-state all of its out- 
puts and bidirectional pins and to disable all pull-ups 
and pull-downs. During tri-state output test mode all 
pins floated during bus hold as well as those which 


are never floated during normal operation are 
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tri-stated. When the 82495XP is in tri-state output 
test mode, external testing can be used to test 
board interconnections. 


On the 82495XP, tri-state output test mode is in- 
voked by driving HIGHZ#(MBALE) and SLFTST #- 
(CRDY #) active to the 82495XP at least 10 clocks 
prior to the deassertion of RESET. Note that 


‘HIGHZ# has priority over SLFTST#. When both 


HIGHZ# and SLFTST# are driven active the 
82495XP will invoke the tri-state output mode and 
not invoke BIST. 


Once tri-state output test mode is invoked, the 
82495XP remains in it until the next RESET. 


9.4 82490XP Cache SRAM Testing 


The 82490XP cache SRAM can be tested using 
standard cache memory testing techniques. Code 
must be written to: 


4. Flush and reset the 82495XP/82490XP/CPU 
cache . 


2. Write 1’s to every bit of a block of memory equal 
to the cache size 


3. Read the block of memory to fill the cache, tag- — 
_ ging the data as read-only using the MRO# sig- 
nal 


4. Write 0’s to every bit in the block of memory. 

. Read the block, the cache hits should be all 1’s 

6. Repeat the process, exchanging 0 for 1 and 1 for 
0 


on 


In this example, the code to test the cache must be 
non-cacheable to the 82495xXP. Also, the CPU 
cache must be on so that the 82495XP will perform 
line-fills. , 


10.0 AC/DC SPECIFICATIONS 


10.1 Background 


The 82495XP has four main interfaces: CPU Bus, 
memory bus controller, memory bus, and 82490XP. 
The memory bus controller is typically implemented 
with PLD devices. The MBC interface signal timings 
are, therefore, generated based on available, off- 
the-shelf PLD specs. The memory bus interface was 
specified to suit a generic memory interface which 
works up to CPU frequency. 
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10.2 D.C. Specifications 


Table 10-1. D.C. Specifications 
Vcc = 5V +5%, Tcase = Oto + 85°C 


| Symbol | Parameter — 
Input Low Voltage | 


Input High Voltage 2.0 


Trt 
Crrttevei@) id 


mA 82495XP @ 50 MHz, (3) 
82490XP @50MHz 

Ww 82495XP @ 50 MHz, (4) 
82490XP @ 50 MHz 


A | 0<Vm> Veo 


a 
+0.8 
Vcc +0.3 

0.45 


ad 


<i 


—0.3 
2.0 


<< 


VoL Output Low Voltage 


= 
a 


i 
f 


< 


VoH Output High Voltage 


~ Power Supply Current 
ace Power Dissipation 


Input Leakage Current 


950 
300 


2.75 
1.50 


£15 


ILo Output Leakage Current +15 uA O< Vout < Vcc Tristate 
Input Leakage Current 


200 uA — Vin = 0.45V, (5) = 


F for82495XP 
| for 82490XP : 


for 82495XP 

for 82490XP 

for 82495XP | 
for 82490XP oe 


Cin Input Capacitance p 
p 
p 
p 
p 
p 
p 


ai, 
tt 


Output Capacitance 
|/O Capacitance 


4 
or © 


oe 


hk cack 
O71 © 


Cok CLK Input Capacitance 14 - for 82495XP | 
: ) 5 for 82490XP | 
CTIN Test Input Capacitance | a 

| “for 82490XP re. 
Crout | Test Output Capacitance | 


F 
F 
F 
F for 82495XP 
F 
: 


| for 82495XP : , 
for 82490XP 
for 82495XP 
for 82490XP 
NOTES: —_ ae 
(1) Parameter measured at 4mA Iload. 


For MCFA6-FCFA0, MSET10-MSETO, and MTAG11-—MTAGO, this parameter is measured at 16 mA lload. 
(2) Parameter measured at 1mA lload. . 

For MCFA6—-MCFAO, MSET10—MSETO, and MTAG11—MTAGO, this parameter is measured at 2 mA lload. 
(3) Typical Supply current 400mA. | . 
(4) Typical Power dissipation is 2W. | 
(5) This parameter is for input with pullup. | 


Test Clock Capacitance 


er ee Ce (ee Cee Ce ee Ce 
oaroas;1oan 
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10.3 A.C. Specifications 
All TTL timing specs are measured at 1.5V for both “0” and “1” logic level. 
Table 10-2. Clock, Reset, and Configuration 


Vcc = 5V + 5%, Tease = Oto + 85°C 
Maximum CL = 50 pF unless otherwise specified. 
Minimum CL = 20 pF unless otherwise specified. 
All Inputs and Outputs are TTL Level. 


CLK MCLE. MOCLK High Tine | 


= 
Oo 
>) 


% 


ae 
N 


ns 10-1 


RESET Duration 8xt2 
iy 15xt2 


All Configurations CFG3-—CFGO, 10x12 
CPUTYP, SNPMD, PLOCKEN, 
MEMLDRV, 82490XPLDRV, HIGHZ#, 
SLFTST # Setup Time 


in 
20 
7 
7 
7 
2 


tO 
t1 
t2 
t3 
t4 
t5 
t7 
t8 
a 


t 


10-4 | for 82495xXP, (2) 
for 82490XP 

10-4 | (3), (4) © 

10-4 | (3), (5) 


10-3 for 82495xP, (6) 
ns 10-3 for 82495XP, (7) 


(8) | 


t10 


| t11 All Configurations CFG3-—CFGO, 
| CPUTYP, SNPMD, PLOCKEN, 


MEMLDRV, 82490XPLDRV, HIGHZ#, 
SLFTST # Hold Time | 


FLUSH #, SYNC # Setup Time 
FLUSH#, SYNC # Hold Time 
FLUSH#, SYNC# Duration 2 


t12 
t13 
t14 


x 
+ 
— | Oyo 


NOTE: 

(1) Rise/Fall, High/Low times measured between 0.8V and 2.0V. | | 
(2) Power up reset duration should be 1 ms after Vcc and CLK are stable. If configuration inputs with pullups are left floated, - 
10 us RESET duration is required. : . 
_ (3) Timing is referenced to reset falling edge. _ 

(4) 8ns setup time is required to guarantee recognition on next clock. 

(5) ins hold time is required to guarantee recognition on next clock. 

(6) To guarantee recognition on next clock. 

(7) Synchronous mode only. 

(8) Asynchronous mode only. To guarantee recognition. 


1 
2 
2 
7 ‘ 


2 
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Table 10-3. Memory Bus Controller 82495XP/82490XP Interface 


Vcc = 5V + 5%, Tcase = Oto + 85°C 
Maximum CL = 50 pF unless otherwise specified. 
Minimum CL = 20 pF unless otherwise specified. 

All Inputs and Outputs are TTL Level. 


Symbol |____ Parameter | Min | Max | Unit | Figure | Notes 
130 BRDY #, CRDY#, KWEND#, SWEND#, 10-3 | 82495XP Only 
| BGT #, CNA#, [WRMRST] Setup Time 
}t30a_—« | BRDY#,CRDY# SetupTime 82490XP Only 
131 BRDY #, CRDY#, KWEND#, SWEND#, 
| BGT#, CNA#, [WRMRST] Hold Time 


CW/R#, CD/C#, CMI/O#, RDYSRC, 
MCACHE #, KLOCK #, BLE#, PALLC#, 
CAHOLD, CWAY, FSIOUT #, CADS#, 
CDTS#, SNPADS # Valid Delay 


NENE #, SMLN# Valid Delay 


t34 MDATA Setup to CLK (clock before 
BRDY # active) 


MDATA ‘Valid Delay from CLK (CLK from 
CDTS# valid, MDOE# active) 

MDATA Valid Delay from MDOE # active 
MDATA Fload Delay from MDOE # inactive 


Vcc = 5V + 5%, Tease = Oto + 85°C 
Maximum CL = 50 pF unless otherwise specified. 
Minimum CL = 20 pF unless otherwise specified. 
All Inputs and Outputs are TTL Level. 


[unt [rows | wotes 
Cra 
Pee [ior fay 
= 


MCFA6-MCFAO, MSET10—MSETO, 13 10-5 (2),(3) 
MTAG11-MTAGO Vaiid Deiay — 

MCFA6—MCFAO, MSET10—MSETO, 2 15 10-5 (4) 

MTAG11-—MTAGO Float Delay © Pee | 
MCFA6—MCFAO, MSET10—MSETO, 
MTAG11-MTAGO Valid Delay . | 


Synbor_ 
[ise | snpcuxrisotime ——S—=~dSCi‘it | | 


Bol Ra ol Ela Loci 
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Table 10-4. 82495XP Memory Interface (Continued) 


Vcc = 5V + 5%, Tcase = Oto + 85°C 
Maximum CL = 50 pF unless otherwise specified. 
Minimum CL = 20 pF unless otherwise specified. 
All Inputs and Outputs are TTL Level. 


[Symbol [| Parameter————=—*| Min | Max | Unit | Figure | Notes 


MCFA6—MCFAO, MSET10—MSETO, MTAG11- 
MTAGO, SNPINVV, SNPNCA, MAOE#, 
MBAOE#, SNPSTB# Setup Time 


MCFA6—MCFAO, MSET10—MSETO, MTAG11- 
MTAGO, SNPINV, SNPNCA, MAOE #, MBAOE # 
Setup Time 


MCFA6-—MCFAO, MSET10—MSETO, MTAG11- 
MTAGO, SNPINV, SNPNCA, MAOE#, MBAOE#, 
SNPSTB # Setup Time 


MCFA6-MCFAO, MSET10—MSETO, MTAG11- 
MTAGO, SNPINV, SNPNCA, MAOE#, MBAOE #, 
SNPSTB # Hold Time 


MCFA6—MCFAO, MSET10—MSET0, MTAG11—- 
MTAGO, SNPINV, SNPNCA, MAOE#, MBAOE # 
Hold Time . 


MCFA6—MCFAO, MSET10-MSETO, MTAG11— 
MTAGO, SNPINV, SNPNCA, MAOE#, MBAOE#, 
SNPSTB # Hold Time 


ise [ snpsraeseuptime ———SsS—=~i | ds | to 
ries [ snpsreerodtime ——SS—=~s at | dos | 108 

t66 
67 


| t66 | SNPSTB# Active/Inactive Time | a ft fons | 10-3 | @) 
t MRO#, MKEN#, DRCTM#, MWB/WT# Setup 10-3 
Time | | 
168 MRO #, MKEN#, DRCTM#, MWB/WT#¥ Hold 1 10-3 
Time ! re | 
169 MTHIT#, MHITM#, SNPBSY#,SNPCYC# | 2 13 10-2 
Valid Delay : | 


NOTES: | | 
(1) Rise/fall times measured between 0.45V and 2.4V 
(2) See capacitive derating curves for loads above the 50pF specification 
(3) Valid delay from MAOE#, MBAOE# going active (low) 
(4) Float delay from MAOE#, MBAOE# going inactive (high) 
(5) Valid delay from MALE or MBALE if both MAOE#, MBAOE# are active —s 
(6) Valid delay from CLK only if MALE or MBALE, MAOE# and MBAOE# are active 
(7) a. In clocked mode referenced to SNPCLK rising edge 
b. In strobed mode referenced to SNPSTB # falling edge 
c. In synchronous mode, refer to CLK . 
(8) Asynchronous clocked mode only. Timings referenced to SNPCLK 
(9) Asynchronous signal. Time to guarantee recognition on next clock 
(10) SNPCLK is only used for the clocked memory bus mode 
(11) t51 > t2 | 
(12) This parameter is valid either from SNPCLK or CLK 
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Table 10-5. 82490XP Clocked Mode 


Vcc = 5V + 5%, Tcase = Oto + 85°C 
Maximum CL = 50 pF unless otherwise specified. 
Minimum CL = 20 pF unless otherwise specified. 

All Inputs and Outputs are TTL Level. 


Vcc = 5V + 5%, Tcase = 0to + 85°C | 
Maximum CL = 50 pF unless otherwise specified. 

Minimum CL = 20 pF unless otherwise specified. 
All inputs and Outputs are TTL Level. | 


— 
rh 


— 
NO 


MSEL# High time for restart 
_MSEL# Setup before transition on MxSTB : 


— 
Be 


MSEL# Hold after transition on MEOC# 
MxSTB transition to/from MEOC # falling transition 


tb 
oO 


/ 496 | MZBT# Setup to MSEL# or MEOG falling edge 


eo 


MZBT # Hold from MSEL# or MEOC # falling edge 2 


~ 


eo 
, wah, 


~ 


MFRZ# Setup to MEOC # falling edge 
MFRZ# Hold from MEOC # falling edge 


: 
186 | 

87 | = 

188 | 3 
189 i | 

190 | : | 7 
‘91 a 

192 ji : 

192 iti : 

196 

t97 i 

| 

| 
|t102__ | MDATA Valid Delay from MxSTB transition | 


t101 MDATA Hold from MxSTB or MEOC # falling transition ! 
t103 MDATA Valid Delay from MEOC # falling transition or ; 
MSEL# deactivation 


~ NOTE: . 
(1) Rise/Fall times are measured between 0.8V and 2.0V 
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Table 10-7. Test Mode 


Vcc = 5V + 5%, Tcase = Oto + 85°C 
Maximum CL = 50 pF unless otherwise specified. | 
Minimum CL = 20 pF unless otherwise specified. 

All Inputs and Outputs are TTL Level. 


[Symbot[ Parameter (in | ax | Unit 
Pivzo | ToKFremeny——SSC~dCSSd 
rinai | ToKPaiod SSCS 
Pitze | toKtightine ——SSSCS~sd td 
fivzs | roKtowtime ——SSSCS~wt | 
fiiza | ToKRisetine ——SSSCS~dSSCid 
ritas | ToKratime SSS 
Pivzs | To.twsseuptine ——SSOS~i Pw 
fz7 | ro.twsHodtine Sid 
[re 
a 
a 
Pitst | AliOutpus Float Qeley SSS 8 | 


NOTES: 

(1) Rise/Fall times are measured between 0.8V and 2.0V Rise/Fall times can be relaxed by 1ns per 10ns increase in TCK 
period . 

(2) TCK period = CLK period 

(3) Parameter measured from TCK 


t2,51,71 


CLK, SNPCLK 


! | Nees 240956-49 


Signal 


240956-50 
tx = 16, 32, 33, 35, 36, 44, 45, 60, 69 


Figure 10-2. Valid Delay Timings 


2-375 


intel. a 82495XP Cache Controller/82490XP Cache RAM PRELIMINARY 


SNPSTB# 


CLK = 
XY 


‘240956-51 


Signal 


. Signal © \ \ 
tx = 30, 62a, 62c, 64, 67, 76, 85 | 
ty = 131, 63a, 63c, 65, 68, 77, 86 


- — | Figure 10-3a. Setup and Hold Timings in 
Figure 10-3. Setup and Hold Timings Strobed Snooping Mode 


ae aay a £10 


Ks VALID ~ y | 


Figure 10-4. Reset and Configuration Timings 


t11. 


Config Es 0 
240956-53 


MAOE#, 
MBAOE# 


240956-54 


240956-55 


Figure 10-6. Active/Inactive Timing 
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t96,98, 97,99, 101 
100 


Signal 


240956-—56 


Figure 10-7. Setup and Hold Timing 


240956-—57 


Figure 10-8. Setup and Hold Timing 


XK — 


OXY 


240956-58 


Figure 10-9. Valid Delay Timing — 
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Figure 10-10. Test Timings 
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Introduction 


The i860!M 64-bit microprocessor is a general-purpose 
CPU with on-chip integer unit, floating point, memory 
management, caches, and graphics. The i860 micro- 
processor supports 3-D graphics software with the fol- 
lowing functions: 


1. Hidden surface elimination 
2. Distance interpolation 
3. Intensity interpolation for 3-D shading 


The fzchks (Z-buffer Check) and pst (Pixel Store) in- 
structions expedite hidden surface elimination. Dis- 
tance interpolation is accomplished with faddz (Add 
with Z merge), and intensity interpolation occurs with 
faddp (Add with Pixel Merge). The purpose of this ap- 
plication note is to illustrate the intended use of these 
instructions in a manner independent of any graphics 
environment in which the instructions might be used. It 
is not the purpose of this application note to present the 
most efficient instruction sequences. While the inner 
loop of Example 7 has as few instructions as logically 
possible, the other examples are intended to present 
general concepts, not optimum implementations. Tun- 
ing for maximum performance depends on the specific 
environment. : 


This application note assumes familiarity with the - 


i8601M 64-bit Microprocessor Programmer’s Reference 
Manual (Intel order number 240329); the i860 micro- 


processor instructions for graphics are detailed in sec- 


tion 6.6. 


1.0 3-D RENDERING 


This series of examples are routines that might be used 
at the lowest level of a graphics software system to con- 
vert a machine-independent description of a 3-D image 
into values for the frame buffer of a color video display. 
Typically, higher-level graphics routines represent an 
object as a set of polygons that together roughly de- 
scribe the surfaces of the objects to be displayed. The 
graphics system maintains a database that describes 


// SET PIXEL SIZE TO 16 
ld.c psr, Ra 


andnoth 
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these polygons in terms of their colors, properties of 
reflectance or translucence, and the locations in 3-D 
space of their vertices. Due to the roughness of the 
representation, the amount of information in the data- 
base is considerably less than that which must be deliv- 
ered to the video display. A rendering procedure, such 
as Example 7, uses interpolation to derive the detailed 
information needed for each pixel in the graphics frame 
buffer. The rendering procedure also performs pixel-by- 
pixel hidden-surface elimination. 


The focus of this series of examples is Example 7, 
which operates on.a segment of a scan line. The seg- 
ment is bounded by two points of given location and 
color: from point (XJ, YO, Z1) with color intensities 
Red1, Grn1, Blul to point (X2, YO, Z2) with color in- 
tensities Red2, Grn2, Blu2. The points and color inten- 


sities are determined by higher-level graphics software. 


The points represent the intersection of the scan line 
with two edges of the projected image of a polygon. For 
a given scan line, the rendering procedure is executed 
once for each polygon that projects onto that scan line. © 
The higher-level graphics software is responsible for 
orienting the objects with respect to the viewer, for 
making perspective calculations, for scaling, and for de- 
termining the amount of light that falls on each poly- 
gon vertex. . 


The 16-bit pixel format is used, giving ample resolution 
for color shading: 26 intensity values for red, 26 intensi- 
ty values for green, and 24 intensity values for blue. 
Example 1 shows how to set the pixel size. For hidden- 
surface elimination, the Z-buffer (or depth buffer) tech- 
nique is employed, each Z value having a resolution of 
16-bits. 


Because the examples presented here use almost all of 
the registers of the 1860 microprocessor, the registers 
are given symbolic names, as defined by Example 2. In 
a real application, it is likely that some of the inputs to 
the rendering procedure would be passed in floating- 
point registers instead of the integer registers employed 


here. The register allocation shown in Example 2 sim- 


plifies the examples by avoiding the need to use any 
register for multiple purposes. 


// Work on psr 
Ox00CO, Ra, Ra// Clear PS 


orh 0x0040, Ra, Ra// PS = 16-bit pixels 


st.c Ra, psr // 


Example 1. Setting Pixel Size 
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// REGISTER DEFINITIONS FOR RENDERING PROCEDURE 
d/ INTEGER LOCALS 
Ra r4 // Temporary 
r5 // Temporary | 
r6 // Temporary 
r7 // Temporary 
INTEGER INPUTS 
rl6 // X coordinate of cearbune point of line segment in pixels 
r17 Width of scan line segment in number of pixels © 
rls Z=-buffer pointer to the current line segment 
r19 Initial Z value, fixed-point 16.16 format 
r20 Z Slope, fixed-point 16.16 format 
r2l _. Graphics frame buffer pointer to the current line segment 
r22 Initial red intensity, fixed-point 6.10 format, plus .5 
r23 Initial green intensity, fixed-point 6.10 format, plus .5 
r24 Initial blue intensity, fixed-point 6.10 format, plus .5 
r25 Red Slope, fixed=point 6.10 format 
r26 Green slope, fixed-point 6.10 format 
r27 // Blue slope, fixed-point 6.10 format 
AL LOCALS 
an | HS OUMNTELES Z values 


SII HT UW Um Ub a a om a ot 


Z interpolant, * coefficient 1.0 
iZlh 
1Z3 | 
iZSh 
oldz 
newzZ 
newzh = 


Z interpolant, coefficient 3.0 


Original values from the Z-buffer 
New Z=-buffer values 


New pixel vainibs 
Red interpolant, coefficient 4. Oo 


Acoumilated red intensities 

éve ca fhtompolant: coefficient 4.0 
Accumulated green intensities 

Blue interpolant, coefficient 4.0 


Accumulated blue intensities 


| // left-end Z mask 
1Zmaskh Sf 
rZmask f4/ right- eenn Z mask 
rZmaskh // 


Example 2. Register Assignments 
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2.0 DISTANCE INTERPOLATION 


To perform hidden surface elimination at each pixel, 
the rendering routine first interpolates the value of Z at 
each pixel. Distance interpolation consists of calculat- 
ing the slope of Z over the given line segment, then 
increasing the Z value of each successive pixel by that 
amount, ste a from XJ. The width of the line seg- 
ment in pixels is . 


dX = X2 — XI 

Calculate the reciprocal of dX: 

RdX = 1/dX 

The value of dX is used several times as a divisor. It is 
most efficient to calculate its reciprocal once, then, in- 
stead of ae: by dX, multiply by RdX. The slope of 
Zis. 


mZ = (Z2 — ZI)*RdX 


Because each polygon is a plane, the value of mZ is. 


constant for all scan lines that intersect the polygon; 
therefore mZ needs to be calculated only once for each 


“ZX + N) = 
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polygon. Example 7 assumes that dX and mZ have al- 
ready been calculated, and all that remains is to apply 
mZ to successive aes Let een be the Z value at 
pixel Xn. Then . | 


Z(X1) = Zl 

Z(X1 + 1) = Z1 + mZ 

Z(X1 + 2) = Z1 + 2*mZ 
Z1+ N*mZ 


2X1 + dX) =Z1 + dX*mZ = Z(X2) 


| Figure 1 illustrates this Z-value interpolation. 


The faddz instruction helps to perform the above calcu- 
lations 64 bits at a time. Because a Z value is 16 bits 
wide, Example 7 operates on the Z buffer in groups of 
four. The faddz instruction, however, treats the interpo- 
lation values (V*mZ) as 32-bit fixed-point numbers; 


- therefore, two faddz instructions are executed for each 
group of four pixels. Because of the way the faddz shifts 


as. X, yz = 4000) 


Z1 = 2400 § 


(r',g',b’,x', y',z’ = 800) 


Z2 = 3000 


3000-2400 


mZ = "19 pixels 


(r"’, g" b", x", y"; Zz" —_ 1000) 
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Figure 1. Z-Buffer Interpolation 
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the MERGE register, the first faddz corresponds to 
even-numbered pixels, while the second corresponds to 
odd-numbered pixels. Instead of starting with the value 
for the first pixel (Z(XJ)) and adding mZ to each pixel 
to produce the value for the next pixel, the example 
procedure starts with the values for the first two even- 
numbered pixels and adds 1*mZ to each of these values 
to produce the values for the adjacent odd-numbered 
pair. Adding 3*mZ to each of the Z values of an odd- 
numbered pair produces the values for the next even- 
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numbered pair. Figure 2 shows one way of constructing 
the operands before starting the distance interpolations. 


(The initial value given to src] depends on the align- 


ment of the first pixel.) Table 1 helps to visualize the 
process. | 


After two faddz instructions, the MERGE register 
holds the Z values for four adjacent pixels (in the cor- 
rect order). The form instruction copies MERGE into 
one of the 64-bit floating-point registers. 


Accumulator 


31 


Z1-1.0*mZ 


fraction 


Z1-3.0*mZ 


Initial 


fraction 
srci 


Interpolants 


31 


fraction 


fraction 


3.0*mZ fraction 


fraction 


Figure 2. faddz Operands 


Table 1. faddz Visualization 


rdest/srct 
es © 


(Ds [30 30 


MERGE Register 


(arse | ate |e 


Because the values of 77 and mZ are constant for each loop through the rendering routine, the numbers shown here are 
-the values of the coefficient V, where the actual operands have the values 27 + N*mZ. For each execution of faddz, src7 
is the same as rdest of the prior faddz. After every two faddz instructions, a form instruction empties the MERGE register. 
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// CONSTRUCT INTERPOLANTS iZl AND iZ3 GIVEN mZ 
ixfr mZ, Join each half in 64-bit register 
shl l, Ra = 2*mZ 
adds Ra, Ra = 3$*mZ 
ixfr Ra, Join each half in 64-bit register 
fmov.ss iZl, Join each half in 64=<bit register 
fmov.ss iZ3, Join each half in 64=bit register 


Red Color 
(0-63) 


(r’ =_ 40, rales b’, x; y's z') 


27-30 
~ 12 pixels 


(r"’ = 40, g", b", a y"; 2" | 
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Figure 3. Pixel Interpolation for Gouraud Shading 


The same register is used as both srcJ and rdest in all 
faddz instructions. This register serves to accumulate Z 
values for successive pixels; therefore, it is called an 
accumulator. The registers used as src2 are called inter- 
polants. The code in Example 3 constructs the interpo- 
lants; it needs to be executed only once for each poly- 
gon. 


3.0 COLOR INTERPOLATION 


To determine the RGB color intensities at each pixel, 
the rendering routine interpolates between the color in- 
tensities at the end points. (This rendering technique is 
called ““Gouraud shading” after H. Gouraud, ‘“Contin- 
uous Shading of Curved Sufaces,” JEEE Transactions 
on Computers, C-20(6), June 1971, pp. 623-628.) Let 
the symbol C (color) represent either R (red), G 
- (green), or B (blue). Color interpolation consists of cal- 
culating the slope of C over the given line segment, then 
increasing the C values of each successive pixel by that 
amount, starting from the values for XJ. This must be 
done for C=R, C=G, and C=B. The slope of Cis... 


mC = (C2 — Cl)*RdX 


... Where RAX = 1/dX 


The value of mC is constant for all scan lines that inter- 
sect a given pair of polygon edges; therefore mC needs 
to be calculated only once for each such pair. Example 
7 assumes that mC has already been calculated for all 
colors, and all that remains is to apply mC to successive 
pixels. Let C(Xn) be a C value at pixel Xn. Then... 


_ C(XI) = Cl 


C(¥I + 1) = cI + 
C(X1 + 2) = C1 + 2*mC 


C(XI + N) = C1 + N*mc 


C(X1 + dX) = Cl + dX*mC = C(X2) 
Figure 3 illustrates Gouraud shading of a triangle. — 


The faddp instruction performs the above calculations 
64 bits at a time. Because a pixel is 16 bits wide, Exam- 
ple 7 operates on pixels in groups of four. Instead of 
starting with the value for the first pixel (C(XJ)) and 
adding mC to each pixel to produce the value for the 
next pixel, the example procedure starts with the values 
for the first four pixels and adds 4*mC to each group of 
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four to produce the values for the next four. Three 
faddp instructions are executed for each group of four 
pixels. The first increments the blue values; the second, 
green; the third, red. Figure 4 shows one way of con- 
structing the operands for each color before starting the 
color interpolations. (The initial value given to src1 de- 
pends on the alignment of the first pixel.) 


Setup of the accumulator and interpolants is similar to 
that of the Z-buffer. The code in Example 4 constructs 
the interpolants; it needs to be executed only once for 
each pair of edges in each polygon. 

\ 


4.0 BOUNDARY CONDITIONS 


The 1860 microprocessor operates on 64-bit quantities 
that are aligned on 8-byte boundaries. The code in this 
example takes full advantage of this design, handling 
four 16-bit pixels in each loop. However, if the first or 
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last pixel of a line segment is not on an 8-byte bounda- 
ry, two kinds of special considerations are required: 


1. Masking of Z values near the end points. 
2. Initialization of the accumulators. 


4.1 Z-Buffer Masking 


When either the first or last pixel of the line segment is 
not at an 8-byte boundary, the rendering procedure 
must mask the first or last set of new Z-buffer values 
(newz) so that the Z-buffer and the frame buffer are not 
erroneously updated. Sometimes both the first and last 
pixels are in the same 4-pixel set, in which case either 
one may not be on an 8-byte boundary. A function that 
looks up and calculates masks is shown in Example 5 


Because the value OxFFFF is used for masking, the Z- 
buffer is initialized with OxFFFE, so that the fzchks 
instruction always finds the mask to be greater than 
any Z-buffer contents. 


Accumulator 


63 , 31 


15 0 


// CONSTRUCT INTERPOLANTS ik, iG, iB GIVEN mR, mG, mB 


shl mR, // Multiply each color slope by four, then 
shl . // shift by 16 to put the significant 

shl // bits into the high-order half | 

Shr // Return significant 16 bits 

shr : // to low-order half. Any sign bits 

shr | // in high-order half are BONE +. 

or Join 16-bit quarters 

or in 32-bit register 

or 

ixfr 
ixfr 
ixfr 
.fmov.ss 
fmov.ss 
fmov.ss 


Join 32-bit halves. 
in 64-bit register 


Example 4. Construction of Color Interpolants 
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emacro zmask l_align, 
// i talign, rualign - 
// Rx, Ry ~ 
data — 
align 8 
left_mask:: //low 
elong 0x00000000, 
eLong OxOOOOFFFF, 
elong OxFFFFFFFF, 
elong OxFFFFFFFTF, 
right_mask:://low 
elong OxFFFFOOOO, 
elong 0x00000000, 
elong 0x00000000, 
elong 0x00000000, 


text 
shl 
mov 
fld.d 


5, 


shl 
mov 
fld.d 


5S, 


// aligned set, then 1Zmask = 
0x8000, aX, 


andh 
be 
fxfr 
_P£xfr 
or 
ixfr 
fxfr 
fxfr 
or 
ixfr 
nop 
-endm 


L2 


Rx, 
Rx, 
1Zmaskh, 
rZmaskh, 
Rx, 
Rx, 


l_align, 
left_mask, 
l_align (Rx), 


r_align, 
right_mask, 
r_align (Rx), 


Ry, 
1Zmaskh 
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r_align, Rx, Ry 
left- and right-end aivednent [0..3] 


in 2-byte units. 
‘scratch registers | 


high 
0x00000000 
0x00000000 
0x00000000 
OxOOOOFFFF 

high 
OxFFFFFFFF 
OxFFFFFFFF 
OxFFFFO0O0O 
0x00000000 


mod 
mod 
mod 
mod 


mod 
mod 
mod 
mod 


l_align Multiply by 8 
Rx 

lZmask Load 8=byte mask 
.rialign // Multiply by 8 
Rx // 
rZmask // 


Load 8=byte mask 
// If the first and last pixels are contained in the same 64-bit 


lZmask OR rdZmask. 
// Is dX negative 
// If not, right end is in other 


ro 


lZmask, Rx 
rZmask, Ry. 


OR low-order half. 


OR high-order half 


Example 5. Z Mask Procedure 
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Table 2. Accumulator Initial Values 


Z1 — 1*mZ 
Z1 — 2*mZ 
Z1 — 3*mZ 
Z1 — 4*mZ. 


C1 — 3*mC 
C1 — 4*mC 


4.2 Accumulator Initialization — 


When the first pixel of the line segment is not at an 8- 
byte boundary, initial values placed in the accumulators 
(aZ, aB, aG, and aR) must be selected so that Z/, 
Red1, Grn1, and Blul correspond to the correct pixel. 
The desired result is that shown by Table 2. However, 
each value is a composite of two terms: one that is 
constant for each edge pair (n*mZ, n*mR, n*mG, 
n*mB) and one that can vary with each scan line (Z/, 
Red1, Grn1, Blul). The example assumes that the con- 
stant values have all been calculated and stored in a 
memory table of the format shown by Table 3. At the 
beginning of each line segment the values appropriate 
to the alignment of the line segment are retrieved from 
the table and added to the initial Z and color values, as 
- shown in Example 6. | 


5.0 THE INNER LOOP 


Once the proper preparations have been made, only a 
minimal amount of code is needed to render each scan- 


Z1 — 3*mZ 
Z1 — 4*mZ 
Z1 — 5*mZ 
Z1 — 6*mZ. 


C1 — 4*mC - 
C1 — 5*mC 
C1 — 6*mC 


line segment of a polygon. The code shown in Example 
7 operates on four :pixels in each loop. The left and 


right ends of the line segment go through different logic 


paths so that the Z-buffer masks can be applied by the 
form instruction. All the interior points are handled by 
the tight inner loop. | 


The controlling variable dX is zero-relative and is ex- 
pressed as a number of pixels. The value of dX also 
indicates alignment of the end-points with respect to 
the 4-pixel groups. Unaligned left-end pixels are sub- 
tracted from dX before entering the inner loop; there- 
fore, subsequent values of dX indicate the alignment of 
the right end. A value that is 3 mod 4 indicates that the 
right end is aligned, which explains the test for a value 
of —5 near the end of the loop (—5 mod 4 = 3). The 
fact that the value —5 is loaded into register Rb on 
every execution of the loop does not represent a pro- 
gramming inefficiency, because there is nothing else for 
the core unit to do at that point anyway. 
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// ACCUMULATOR INITIALIZATION TABLE 
data; .align .double 
acc_init_tab:: .double [16] 0 
.dsect | 
aBis -double // Four initial 16-bit blue values 
aGi; ~double // Four initial 16-bit green values 
aRi: -double // Four initial 16-bit red values 
aZi: -double // Two initial 32-bit Z values 
-end 
etext 
// INITIALIZE ACCUMULATORS 
emacro acc_init Lalign, Rtab, Rx, Ry, Fx, Fxh 
Lalign - left-end alignment (0..3) in two-byte units 
Rtab - register to use for addressing the table 
Rx, Ry, Fx, Fxh = scratch registers 
mov acc_init_tab, Rtab 
shl, O's Lalign, Lalign Multiply by row width 
adds Lalign, Rtab, Rtab Index row corresponding to: alignment 
fld.d aZi(Rtab), | aZ Z : 
ixfr Zl. Fx os Z 7 
fld.d aRi(Rtab), aR R=-Load constant values 
shl il, Rx R-Shift starting value to hi-order 
fmov.ss Z ; 
shr : | R=-Redl stripped of sign bits 
fiadd.dd Z 
or R=-Form (Redl,Redl) 
ixfr | R-Put in 64=bit register 
fld.d 
Shl - 
fmov.ss 
shr 
fiadd.dd 
or. 
ixfr 
fld.d aBi(Rtab), 
shl | 16, 
fmov.ss Fx, 
shr 16, 
fiadd.dd Fx, 
or | Rx, 
ixfr Ry, 
fmov.ss Fx, 
fiadd.dd Fx, 
-endm 


-Form (Redl,Redl,Redl,Redl) 


-~Add variables to constants 


DWWWOWQAWWAaADQWADWD|AN|A 


Example 6. Accumulator Initialization 
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RENDERING PROCEDURE 
16-bit pixels, 16-bit Z-buffer : 
and 3; Xl, Ra // petetnids Ati saient of starting-point 
acc_init Ra, Rb, Re, Rd, Fa, Fah // Initialize accumulators 
subs 4, Ra, Rb ‘// 4 = alignment | 
subs ax, Rb, aX // Adjust aX by Xl alignment 
// If aX <= 0, then right end is in same set as left end 
and 3, ax, Rb // Determine alignment of right end 
zmask Ra, Rb, Re, Rd ~ // Prepare both left= and right-end masks 
left_end:: // Handle boundary conditions 
d.faddz aZ, iZ3, //- Taverpelate: 2 even Z values 
adds -8, _ FBP, ° // Anticipate autoincrement 
d.faddz aZ, // Interpolate 2 odd Z values 
adds -8, // Anticipate autoincrement 
d.form 1Zmask, , Mask 4 new Z values 
fld.d 8 (ZBP), 6 a% 3 Fetch 4 01d Z values 
d.faddp ' AaB, iB, Interpolate 4 blue intensities 
mov -4, Loop increment: 4 pixels 
d.faddp aG, j i Interpolate 4 green intensities 
adds — -4, > area Prepare dX for bla at end of loop 
d.faddp' $6 aR, i ~ aR // Interpolate 4 red intensities 
bla Ra, } Initialize LCC : 
d.form.  ° f0, © Ca, Move 4 new pixels to 64-bit reg _ 
adds 5; | : Are there any whole sets (dX < -5)? 
d.fzchks oldz, on > Mark closer points in PM[7..4] 
 sghort_segment «off Get out now if no whole set 


16(ZBP), oldz Fetch 4 old Z values. 
inner_loop:: // Handle all interior points 
d.faddz  aZ, 5 BY A aZ - Interpolate 2 even Z values 
d.faddz aZ, iZl, aZ Interpolate 2 odd Z values 
fst.d newz, 8(ZBP)++ Update Z buf from.prior loop 
d.form f0, newzZ Move 4 new Z values to 64-bit reg 
nop a 7 
d.fzchks f0, fO, . fo - Shift PM[7..4] to PM[35..0] 
mov -5, Rb | -5 mod 4 = 3, aligned right end 
d.faddp aB, iB, aB Interpolate 4 blue intensities 
pst.d newi, 8(FBP)++ // Store pixels indicated by PM[3..0] 
d.faddp aG, iG, | aG Interpolate 4 green intensities 
Rb, ax, ro Are we at an aligned right end? 
ak, ik, aR - Interpolate 4 red intensities. 
aligned_end Taken if at an aligned right end 
f0,  newi- Move 4 new pixels to 64-bit reg 
Ra, ax, inner_ loop // Loop if not at end of line segment 
oldz, newz, newz// Mark closer points in PM[7..4] | 
16(ZBP), oldz // Fetch 4 old Z values for next loop 
// End of inner_loop. Right end not aligned 


Example 7. 3-D Rendering (1 of 2) 
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right_end:: // Handle boundary conditions 

a. faddz aZ, 1Z3, aZ Tterpolace: 2 even Z values 
nop 

d.faddz aZ, iZl, aZ // Interpolate 2 odd Z values 
fst.d . newz, 8 ( ZBP) ++ Update Z buf from prior loop 

ad.form rZmask, newz ' Mask 4 new Z values 
nop 

d.fzchks fO, fo, f0 Shift PM[7..4] to PM[35..0] 
nop 7 

da.faddp aB, iB, aB Interpolate 4 blue intensities 

pst.d ‘ 8 (FBP) ++ Store pixels indicated by PM[3..0] 

d.faddp aG, iG, aG Interpolate 4 green intensities 
nop 

d.faddp aR, ik, aR Interpolate 4 red intensities 
nop : 


aligned_end:: // No special boundary conditions 
fo, newi // Move 4 new pixels to 64-bit reg 


wrap_up // 
oldz, newzZ, newz// Mark closer pores in PM[7..4] 


// 


short _segment:: . 
d.fnop 77 
adds 8, dx, // Is right end in same set as left? 
d.fnop me | | | // 
bne.t right_end | // Branch taken if no. 
d.fnop | // 
fld.d 16(ZBP) , oldz // Fetch 4 old Z values 


wrap_up:: // Store the unstored and leave dual mode. 
fzchks fo, fo, fO // Shift PM[7..4] to PM[3..0] | 
fst.d newz, 8(ZBP)++  // Update Z buf from prior loop 
fnop | 
pst.d newi, 8 (FBP) ++ // Store pixels indicated by PM[3..0] 


Example 7. 3-D Rendering (2 of 2) 
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6.0 ALTERNATIVE IMPLEMENTATIONS 


Example 8 contrasts the inner loop of the 16-bit pixel rendering procedure with that of an 8-bit procedure. For 8-bit 
pixels, two faddp instructions accomplish 64-bits of pixel intensity interpolation; there is no need to maintain three 
separate color accumulators. Four faddz instructions (rather than two) are required, because eight Z values are 
created for the eight pixels per loop. 


// 8=-bit Pixels, 16-Bit Zbuffer = 8 Pixels in 15 Clocks 
// G-Unit | Core Unit 
inner_loop:: _ | _ 
d.faddz aZ,deltaZl,aZ 
d.faddz aZ,deltaZ2,azZ 
d.form f0,newZ_A 
d.faddz aZ,deltaZl,az 
d.faddzz aZ,deltaZ2,aZ 
d.form f0,newZ_B a: 
ad.fzchks o0l1dZ_A,newZ_A,newZ_A 
d.fzchks 01dZ_B,newZ_B,newZ_B 
d.faddp intens,dI,intens 
d.faddp intens,dI2,intens 
f0,newi 


fld.q 16(ZBP) ,oldZ_A 
nop | 
nop | , 
andh 0x8000,dxX, r0O 
rightend 


newZ_A ,16(ZBP)++. 
0,dX,end 
neg8,dX,inner_loop 
newi,8(FBP)++ — 


. 
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// 16-Bit Pixels, 16-Bit Zbuffer = 4 Pixels in 10 Clocks 
// G-Unit | | i. Core Unit 
inner_loop:: 
d.faddz aZ,iz3,aZ 
d.faddz aZ,izl,aZ 
d.form f0,newz 
d.fzchks f0,f0,f0 
d.faddp aB,iB,aB 
d.faddp aG,iG,aG 
d.faddp aR,iR,aR 
da.form f0,newi | 
d.fzchks oldz,newz,newz 


nop 

fst.d newz,8(ZBP) ++. 

nop | | | 
~5,Rb 
newi,8 (FBP) ++ 
Rb,dxX,r0 
aligned_end 
neg4,dX,inner_loop ~ 

—16(ZBP) ,oldz 


we we we we we we we we we we 


Example 8. Inner Loop of Renderers for Two Pixel Sizes | 
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ABSTRACT 


The 1860 Processor computes floating-point results rap- 
idly, lending itself to DSP (digital signal processing) as 
well as general-purpose computing. With this high per- 
formance, DSP functions can be added to any system 
containing an i860 CPU. A Fast Fourier Transform 
(FFT) illustrates this DSP power. Complete code for 
the FFT is presented in this application note, as well as 
performance measurements. Both complex and real in- 
put data FFTs are included, as well as both Decimation 
in Time and Decimation in Frequency. 


1.0 INTRODUCTION TO FAST 
FOURIER TRANSFORMS 


Discrete Fourier Transforms (DFTs) change time-do- 
main data samples into a frequency-domain profile of 
the sampled signal. The frequency-domain representa- 
tion consists of the magnitudes of sine waves at various 
frequencies, which would recreate the original data if 
superimposed. To accomplish the transform, a DFT 
adds combinations of the input data samples, after mul- 
tiplying some of those inputs with weighting factors. 
The number of samples, “‘N’’, is usually a power of two. 


Each result in the frequency domain comes from a ~ 


~ weighted sum of all data samples. The weighting (““W”’) 


factors are called “‘twiddles”, and are complex cosine/ _ 


sine values for each particular frequency. 


The FFT (Fast Fourier Transform) is an efficient im- 
plementation of the DFT, defined by: 


x(n) = ae domain eer of the signal, 


= 0,1,..: N-1 | 
X(k) = the Discrete Fourier Transform of x(n), k = 
0,1,...N-1 


= ce ceuciey domain” equivalent of x(n) 


= ~ x(n) * Wk, n = Oto N-1, and 
Wak = ej2mnk/N , where j = y—1 


= 2 x(n) * (cos(2a7nk/N) — j * sin(27rnk/N)) 


The (N-1) complex adds and (N-1) complex multiplica- 
tions required for each X(k) make the DFT an Order 
(N2) computation. Fortunately, the FFT decomposes 
this to an Order (N * log» N) algorithm by splitting the 
N-sum into units of 2-sums. These units are called 
“butterflies” because they produce 2 output values 
from 2 inputs, with the butterfly-shaped dataflow 
shown below. (Some FFT algorithms, called Radix-4, 
use 4-input, 4-output butterflies.) The butterfly calcula- 
‘tions are executed in stages, with logy N stages and N/2 
butterflies per stage. | 
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The subdivision, or decimation, of the N-sum into but- 
terflies can be done via two different methods: “Deci- 
mation in Time” (DIT) or “Decimation in Frequency” 
(DIF). The methods differ in the ordering of twiddles 
and the form of the butterfly arithmetic, but they yield 
the same answer. They are based on different mathe- 
matical derivations of the FFT: DIT results from recur- 
sively splitting the input time-domain samples into an 
even-indexed group and an odd-indexed, while DIF 
comes from splitting the DFT output frequency-do- 
main points into odd/even groups. 


2.0 BUTTERFLY DEFINED | 


Let A = the first input to the butterfly (complex 
number, composed of Real part AR and g 


Imaginary part AI) 


B = the second input to the butterfly (com- 
plex, BR and BI) 


twiddle factor (also complex, WR and 
WI) 


Anew = complex result #1, which overwrites A 


< 
il 


Bnew = result #2, which overwrites B 


For a “Decimation-in-Frequency” butterfly, 
Anew = A+B 
- Bnew = (A — B)* W 
The complex add, subtract, and multiply of a butterfly 


decompose into 4 real multiplies, 3 real adds, and 3 real 
subtracts: 


AnewR = AR + BR 
AI + BI 


AR-BR 
AI-BI 


tempR = 


Anewl = | templ 


BnewR = (enioR * WR) — (tempI * WI) 


Bnew! = (tempR * WI) + (tempI * WR) 


For a “‘Decimation-in-Time” butterfly, 
=A+(B*W) 
(B* W) 


Anew 
Bnew = A - 


The number of real operations remains 4 multiplies and 
6 add/subtracts, but the equations differ and the multi- 
plies must be done first: 


tempR = (WR * BR) — (WI * BI) 
templ = (WR ”* BI) + (WI * BR) 
_AnewR = AR + tempR BnewR = AR-tempR 


Anewl = AI + templ BnewI = AlI-templ 
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Butterfly Dataflow: 


(Decimation in Frequency) | | (Decimation in Time) 


Anew =A+B Anew =A + (BW) 


Byew=(A~B)*Wo | Byew= A= (B*W) 


'240658-1 


The stages, twiddles, and butterflies for 8-point FFTs stages. Refer to a text on Digital Signal Processing for a 


‘are shown in Figures 1 and 2. For larger values of N, complete discussion of FFT design, such as chapter 6 of 
the dataflow patterns are very similar, with N/2 butter- Theory and Application of Digital Signal Processing (see 
flies executed at each stage, and a greater number of the Bibliography at the end of this note). 


7 : | ; ; 
" 240658-2 
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Figure 2. Decimation-In-Time FFT for 8 points 
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3.0 BIT REVERSAL 


Due to their structure, FFT algorithms have the side- 
effect of scrambling the ordering of output data. For 
radix-2 FFTs, the output is in “bit-reversed” order— 
for example, the value for frequency one is NOT at 
location one in the output array, but at location N/2. 
Time to unscramble the output is often NOT included 
‘in FFT benchmarking, because scrambled output is fine 
for some signal-processing uses such as convolution. In 
any event, unscrambling consists of swapping the loca- 
tions of pairs of output values. Alternatively, input val- 
ues can be shuffled, as Decimation in Time usually does 
before the first stage (as shown in Figure 2). Otherwise, 
to avoid the shuffling of input in DIT, the twiddles 
must be accessed in bit-reversed order. As an example 
of bit-reversal, for 256 points the reordering involves: 


SWAP X(i) and XQ), where 1 = ’klmnopqr’b and j = 


’rqponmlk’b. The second index (j) contains the same 
bits as (1), but in Bp pee order. | 


//inner_loop: do 2 Decimation-In-Frequen 
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4.0 FFT IMPLEMENTATION ON THE 
i860 CPU 


Several features of the i860 CPU contribute to FFT 
performance. The floating-point multiplier and: adder 
can simultaneously produce 1 product and 1 sum per 
cycle, using Dual-Operation FP instructions. To fetch 
the butterfly inputs and store outputs, Dual-Instruc- 
tion-Mode allows a memory fetch or store simultaneous 
with the multiply and add. Four floating-point numbers 
can be stored by one instruction, using the 16-byte-op- 


_erand “fst.q” instruction. Likewise, 16 bytes can be 


fetched from the data cache in one fld.q op. 


The floating-point arithmetic of the 1860 CPU con- 
forms to IEEE 754 format, which some DSPs fail to do. 
Shown below is code for the crucial inner loop of the 
FFT: 


cy FFT butterflies. 


// Twelve clocks for 2 butterflies - 12 FP add/sub, 8 multiplies, 


// 6 8-byte loads, 4 8=byte stores. 
// FP-op 
inner_loop:: 
d.répt.ss 
d.pfsub.ss 
_d.ratls2.ss 
d.i2st.ss 
d.ratlp2.ss 
d.ialp2.ss 


; C 
pflid 
fld.d 
fld.d 
fst.q 
adds 

pfld. 


WR,DI,BnewR 
AR,BR,AnewRo- 
AI,BI,Anewlo 
WI,DR,Bnewl 
AR,BR,DR 
AI,BI,DI 


e 
9 
ry 
bd 
. 
> 
° 
> 
° 
? 
e 
> 


adds 
fld.d 
fld.d 
fst.q 
bla 
and 


WRo,DI,BnewRo 
ARo,BRo,AnewR 
AIo,BIo,Anewl 
WIo,DR,Bnewlo 
ARo,BRo,DR 
AIo,BIo,DI 


d.pfsub.ss 
d.ratls2.ss 
~@.i2st.ss 

d.ratlp2.ss 

d.ialp2.ss 


we we we we we we 


.d 


ore-op. 


wind (wstart) ,WRo 
8 (fetch)++,ARo 
offset (fetch) ,BRo 
AnewR,16 (Store) ++ 
wincr,wind,wind 

ad wind (wstart) ,WR 
wincr,wind,wind 

8 (fetch)++,AR 

offset (fetch) ,BR 

BnewR, offset (Store) 
decrem,count,inner_loop 
wlimit,wind,wind //modulo. 
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5.0 CODE DESIGN 


Refer to the inner__loop above and code listings at the 
end of this application note for the discussions that fol- 
low. Refer to. the “i8607™ 64-bit Microprocessor Pro- 
grammer’s Reference Manual’ (Intel order number 
240329) for details on instructions and formats. 


The programs include both assembly and Fortran com- 
' ponents. Input data can number any power of 2 from 
16 to 1024 points. The algorithms are radix-2, floating- 
point, in-place. Included in the listing are both Decima- 
tion-in-Time and Frequency, and both complex-input 
and real-input FFTs. 


5.1 Cache Utilization | 

Because the instruction cache contains 4-Kbytes, all re- 
quired code easily fits in cache. However, a 1024-point 
complex FFT fills the 8-Kbyte data cache with the in- 
put X() array. Thus the more rarely-used twiddle W() 


‘array is intentionally kept out of cache, as described in 
the “pfld”’ section. 


A subroutine (“fetch.ss’’) is used to move the input data 
array efficiently into cache for the 1024-point FFT. 
“Fetch” allows all data to be brought into cache using 
the next-near (NENE#) accesses to DRAM. Without 
that routine, getting A and B from locations separated 
by 4 Kbytes (NOT the same DRAM page) makes 


fetches and writebacks from DRAM for the first stage: 


slower, and adds 30% to overall execution time. 


‘For larger FFTs (2048 points = 16 kB), straightfor- 
ward expansion of the present algorithm would cause 
increased cache misses. Thus a larger FFT should be 


‘broken into multiple FFTs of 1024 points so that all 10 © 


stages of each can achieve high cache hits. The algo- 
rithm becomes (assuming 2048 points, Decimation-In- 
- Time): : 

1) Bit-reverse the entire input array 


2) Do a 10-stage FFT on the second set of 1024 points. 
~ Cache hits should be high on those, since they were 
most recently accessed by the bit-reversal. 


3) Do a 10-stage FFT on the first 1024 points. Prefetch 
before the first stage to ensure cache hits. 
4) Combine the 2 separate 1024-point results with a fi- 


nal stage of butterflies, where A is offset from B by 
8 Kbytes. 
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5.2 Pfid 


Twiddle factors (W) are fetched with pfld (Pipelined 
Floating-Point Load), to avoid caching them. Only in © 
the first stage are all the W() elements used; successive 
stages use fewer and fewer elements, which are separat- 
ed by larger and larger strides. Thus placing W() in 
cache would be inefficient. The streaming of W() from 
main memory actually yields better performance than 
caching W(), for 512 and 1024 points. With the i860 
CPU’s 8-byte external data bus, a complex W() value 
can be transferred in a single bus cycle. Some FFT rou- 
tines calculate W() on the fly, rather than fetching pre- 
calculated values; however, performance decreases due 
to the added run-time calculations. 


5.3 Fsi.q 


Quad-word (16-byte) stores allow 4 floating-point regis- 
ter values to update the cache in one cycle. Likewise, 
fld.q (Quad Floating Point Load) transfers 4 values to 
the registers in a cycle. However, in some FFT stages, 
double-word fetches (fld.d) are used instead of fld.q; 
that allows the “background”? fetch of a set of operands 
concurrent with arithmetic on the other set. For the 
same reason, the inner loop does 2 butterflies, rather 
than one. 


5.4 Bit Reversal Code 


The code for bit-reversal fetches the indices of 2 ele- 


ments to be swapped from a pre-allocated array of indi- 


ces, and swaps the data elements. Again, pfld.d keeps 
the indices out of cache, for the 1024 point case. That 
assembly version of bit-reversal is approximately 7 
times faster than the standard Fortran routine. The ar- 
ray of indices was generated by printing out the values 
generated during operation of the standard Fortran ver- 
sion; similarly, the twiddle W() values can be pre-allo- 
cated and generated using a high-level- language pro- 
gram. 


6.0 PIPELINE SCHEDULING 


The adder pipeline is 3 stages, as is the multiplier; for 


_ the calculation of 


BnewR = (AR — BR) * WR — (Al — Bl) * WI 


the adder result is fed back into the multiplier, and the 
product again feeds into the adder. The adder and mul- 
tiplier pipes each advance one stage for each floating- 
point instruction issued. | 
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The butterfly decomposes into 6 real add/subtracts and 
4 real multiplies. Thus the best possible performance 
would be 6 clocks per butterfly, with the multiplies to- 
tally overlapping the adds. The overlap is accomplished 
with the Dual-Operation instructions: 


r2pt (KR*src2, Treg + Mout, load KR <— srcl) 
ratls2 (KR*Aout, srcl-src2, load T <— Mout) 
i2st (KI*src2, Treg-Mout, load KI <— srcl) 
ratlp2 (KR*Aout, srcl + src2, load T <— Mout) 
ialp2. (KI*Aout, srcl +src2, load KI <— srcl) 


KR, KI, and T are operand registers feeding the multi- 
plier and adder, separate from the floating-point regis- 
ter file. They permit the 4 inputs for multiply and add, 
even thought the instruction format holds only 2 regis- 
ters. ““Aout” and “Mout” are adder and multiplier out- 
puts. . 


The data path arrangements of some of these ops are 
illustrated in Figures 3 and 4. Fetching and storing of 
butterfly operands is overlapped with the calculations, 
using Dual Instruction Mode — the integer core op 
(such as a load or branch) and FP op are fetched simul- 
taneously from the instruction cache and executed 
simultaneously. | 


Scheduling of instructions was done with a pipeline dia- 
gram, as illustrated in the comments of the code listing 


MULTIPLIER UNIT 
RESULT 


op2 


ADDER UNIT 


RESULT 


r2pt & r2st 
240658-4 


Figure 3. Datapath for r2pt op 
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of difstep.ss in the Appendix. (The comments show the 
machine state after the instruction is processed.) Begin 
by placing the desired results in the rightmost column, 
then tracing progress backwards through the adder. 
When adder inputs are products (of the multiplier), one 
product is kept in the Treg for a cycle while the other 
propogates through the multiplier final stage. Those 
products can be traced back on the multiplier pipeline, 
to determine at what instruction the multiplier inputs 
must be provided. | 


For example, place the BnewR label in the “Write” 
stage of the pipe (the output of the Adder). Now 


BnewR = WR * DR —- WI * DI 


Three instructions earlier, the adder inputs for BnewR 
must be fed to adder; those inputs are products, one of f 
which comes directly from the multiplier output, and 
the other from the Treg. The multiplier output and 
Treg value must then be traced back through multiplier 


- stages, requiring the following instructions: 


i2st.ss WIo,DR,Bnewlo as the 10th op of 12, to start (T — Mout) 
ratis2.ss Alo,Blo,AnewI as the 9th instruction, to update the Treg 
ialp2.ss AI,BI,DI as the 6th op, to multiply DI * WI 
ratlp2.ss AR,BR,DR as the Sth op, to multiply DR * WR 
ratls2.ss AI,BI,Anewlo as the 3rd, to start DI into the adder 


pfsub.ss AR,BR,AnewRo ° as the 2nd, to start DR into the adder 


src2 rdest 


~ MULTIPLIER UNIT 
RESULT 


op1 op2 
ADDER UNIT 


RESULT 


ratip2 & rat1s2 
. 240658-5 


Figure 4. Datapath for rat1p2 op 
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Some trial-and-error ordering of the desired outputs is 
needed to devise a sequence which keeps the adder 
pipeline full. An op is chosen for each slot for its ability 
to load the KR or KI register, or to initiate an adder 
operation simultaneous with the multiplies required to 
calculate BnewR and Bnewl. 


Handy hints to assist dual- -operation scheagiie in- 
clude: | 


1) Feedback the adder rena to the sniltiptie® or visa 
versa, whenever possible. For example, the ratlp2 
op feeds adder-out to multiplier. Thus both src1 and 
src2 fields of the instruction are available to feed the 
adder-in, and a simultaneous useful add and multi- 
ply are initiated. 


2) Freeze one of the pipes, by using a pfadd or pfmul, 
when appropriate. In the butterfly, where 6 adds are 
done for every 4 multiplies, freezing of the multipli- 
er does not degrade performance. The freeze allows 
multiplier results to be held until needed i in the ad- 
der. 


3) The 7 reg can hold a multiplier result for several 


cycles until needed in the adder. 


4) Unroll a loop to do 2 iterations per loop. That pro- 
vides time to fetch inputs for iteration 2 while calcu- 
lating iteration 1, and store results of iteration 1 
(and fetch more inputs) while calculting iteration 2. 


7.0 PERFORMANCE MEASUREMENTS 


The code was run on an evaluation card with DRAM 
memory only, no external cache, 33.33 MHz clock, and 
5 wait-states or more for some accesses. Next-near ac- 
cesses (address falls into the same DRAM page as the 
previous access) are zero wait-state, but far accesses 
take 5 or more wait-states. The code was run under a 
virtual-memory multitasking executive. Shown below 
are measured results: 


System: 33.3 MHz 80860 with a single bank of 
static-column DRAM 


Algorithm: Radix-2 FFT, in-place. Data is IEEE 754 
single-precision floating point. Implemented in assem- 
bly-language and Fortran code. 
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Time 
(including 
bit-reversal) 


Type of FFT 


1024-point-complex, DIF 
1024-point-real | 
512-point-complex, DIF 
512-point-real 
256-point-complex, DIF 
1024-point-complex, DIT 
512-point-complex, DIT 


7.1 Cache Fill and Writeback Time 


Measured times do not include cache-fill and write- 
back. That is, the timings measured 200,000 executions 
of the FFT using the same input array. (Performance 
figures offered by other manufacturers for DSP chips 
likewise assume that the data is already in on-chip 
RAM. Of course, the i860 CPU will do that fetching 
automatically into its data cache.) The additional time 
for cache fill and writeback were measured as: 


1024-point-complex 0.25 ms (8 Kbytes fetched, | 
8 Kbytes writeback) 


512-point-complex 0.12 ms (4 Kbytes) 


To quantify the calculations in MFlops (Millions of 
FLoating-point OPerations per Second), consider that 
the 1024-point complex FFT is implemented with 
about 16,400 multiplies and 28,700 adds/subtracts. 
Thus the 1.17 ms translates to a sustained 38.5 MFlops 
rate. For 512 points, the required 20, 000 Flops means 
41.6 MFlops. 


The overall FFT is about 10 times faster than the equiv- 
alent Fortran. Inner loop performance was measured at 
13 cycles for the 24 instructions, which is 6.5 cycles per 
butterfly. 
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Pictured below are the programs developed for the i860 CPU FFT: 


ffttest.f 


bitrev.ss 


fetch 


The Fortran program ffttest.f is the highest-level pro- 
gram of those listed on the following pages. It calls two 
FFT subroutines, diff.f and fft.f, then compares their 
outputs. Fft.f is a Fortran decimation-in-time algo- 
rithm, while diff.f is the high-speed DIF routine. Diff.f 
is callable by C or Fortran applications. It in turn calls 
difstep, which is implemented in assembly code 
(difstep.ss). Difstep is called once per stage of the FFT. 
A Fortran version (difstepf.f) is shown, for comparison. 
Other assembly routines are the bit-reversal-data-move- 
ment (bitrev.ss) and prefetch (‘‘fetch”’ inside bitrev.ss). 


Difstep.ss contains approximately 225 assembly in- 
structions, and bitrev.ss contains about 24. The Fortran 
diff.f compiles to about 80 instructions. 


A Decimation-in-Time version of diff.f and difstep.ss 
can be found in ditt.f and ditstep.ss. The DIT version 
performs 5-10% slower than the Decimation-in-Fre- 
quency because the DIT loop takes 7 cycles per butter- 
fly, while DIF takes 6. 


A real-input algorithm is dirr.f, which can be called 
and tested using program real.f. Dirr.f calls difstep to 
do a complex DIF FFT on N real data points, but 
treats them as N/2 complex points. Then realfix.ss is 
called by dirr.f to fix the DIF output, compensating for 
the treatment of the N real points as N/2 complex. The 
derivation of the real-fix can be found in reference 3, 
Numerical Recipes in C. 


The mixture of Fortran, C, and assembly code is ac- 
complished by passing function inputs and outputs in 
registers. Only pointers and integer values were used in 
the above code, but floating point parameters can also 
be exchanged. A calling program feeds arguments to a 
function in rl6, r17, and higher-numbered integer reg- 
isters. The callee is permitted to destroy the contents of 
those registers, but rl:r15 must be preserved. For more 
details on parameter-passing conventions see the i860 
64-bit Microprocessor Programmer’s Reference Manual, 
Chapter 8. 
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9.0 CONCLUSION 


The i860 CPU computes very Fast Fourier Transforms, 
quicker than most high-end dedicated DSP chips. Con- 


- tributing to the FFT performance are the 8-kByte on- 


chip data cache and 4-kByte instruction cache. Also the 
8-byte external data bus, pfld instruction, and 16-byte 
data cache width provide sufficient bandwidth to keep 
the arithmetic units busy. Dual-Operation instructions 
and Dual-Instruction-Mode allow parallel data move- 
ment and calculations. The 33.3 MHz clock rate allows 
both an add and a multiply every 30 ns, giving a time of 
1.17 ms for a 1024-point complex FFT. A 40 MHz i860 
Microprocessor will yield a time of less than 1 mSec. 


ACKNOWLEDGEMENTS 


The author wishes to thank Tricord Systems, Inc. for 
providing the key inner loop kernel design of the FFT. 


BIBLIOGRAPHY 


1. Gold, Bernard and Rabiner, Lawrence, Theory and 
Application of Digital Signal Processing, 1975, Pren- 
tice-Hall Inc.,: Englewood Cliffs, NJ. Pages 356- 
381,573ff 


[This text explains DFT and FFT basics well, with 
ample pictures] | | 


2. Horden, Ira, “An FFT Algorithm For MCS(c)-96 
Products Including Supporting Routines and Exam- 
ples’, Intel Application Note AP-275, order number 
270189. (That Application Note can also be found 
in the Intel Embedded Controller Handbook, Vol- 
ume IJ, order number 210918) 


[The note, dated 9/87, reviews FFT theory, real vs. 
complex, A/D issues, and waveforms] 


3. Press, William, Flannery, Brian, et. al., Numerical 
Recipes in C, 1988, Cambridge University Press. 
Pages 398-424. | 


[Numerical Recipes contains the C-code source for 
“realfix’’] 
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APPENDIX A 
PROGRAM LISTINGS 


1) diff.f: 

Fortran module to do fast Decimation-In-Frequency (DIF) Radix-2 FFT. 
2) difstep.ss: 

Assembly code which does all DIF FFT butterflies; called by diff.f. 
3) difstepf-f: | 

Fortran equivalent of difstep.ss. . Included here for clarity. 
4) bitrev.ss: 

Assembly code to do bit-reversal. 
5) ffttest.f: 

Highest-level Fortran code. Tests diff.f or ditt.f. 
6) ditt-f: 

Fortran meaule: to do fast Decimation-In-Time (DIT) Radix-2 FFT. 


7) ditstep.ss: 


Assembly code which does all DIT FFT butterflies; called: by ditt.f. 
8) dirr.f: 3 

Fortran module for Reet -Input Decimation-In-Frequency (DIF) Radix-2 FFT. 
9) realfix.ss: : 

Assembly code required by dirr.f to compensate for Real-Input. 


10) real.f: 


Highest-level Fortran code: for Real- ‘allie input. Tests dirr.f. 
11) fft.f: | | | 
Fortran FFT algort Gein ‘correct” answers for comparison against the other code. 


12) makefile: 


Unix V/386 version of a makefile to maintain the FFT code, using the Unix “make” program-mainte- 
nance utility. Note that this makefile uses the Unix macro preprocessor “m4” to convert symbolic names 
to register numbers. 


13) start.ss: 
| scenes code preamble for oie runtime. 
14) time.c: | 
uma routine, used to install breakpoints. 
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C2 


File: diff.f . 
FFT - Decimation in Freq, radix-2, inplace, l-dimen 


© 


C Intel assumes no reSponsibility for use or misuse of this code. 


C 5/19/89: call fetch8() added for 1024-point caching. 
C 6/01/89: fetch() CRUCIAL-30% performance loss if removed 


Inputs: 
A= complex array of input, up to 1024 pts, single-prec float 
M= log of number of pts 
= (number of stages of FFT) 
N = number of points. ie, N= 2**M = number of pts 
W= complex array of twiddle factors, length N/2. 
REV= 0 if bitreversed output ok. l=must re-order output 


Outputs: 
A= complex fft of input A 


Aa lA A aQAaraA aa a Ana 


Subroutine diff(a,m,N,W,REV) 

integer m,N, i, j,k, REV,wlimit 

integer offset, Stage, groups, RCE »powers2 (03 10) 
complex a(n),w(N/2) ,temp 


data powers2 /1,2,4,8,16,32,64,128,256,512, 1024/ 
C Powers2 to avoid calls to POW, DIV 


C Twiddle factor array w(k) has (eesaiae. of 2pi*k/N 
CC Assume the caller provides w(k) constants ALREADY initialized 


C Pre=touch data, lock into cache, for 8kByte fft: = 
IF (N .gt. 513) THEN 
call fetch(a,%VAL(n) ) 


C "DO 20" stage-loop 
DO 20 stage = l,m 
groups = powers2(Sstage-~1L) 
C groupS=number of times the twiddle factors are used, deg the inbox of 
C smaller DFTs the stage is split into. er 


C offset gets N/2,N/4,N/8,N/16,... 

offset = powers2(m=Stage) 

wincr = groups 

call difstep(a,w,groups,offset,wincr,wlimit) 
20 CONTINUE 


IF (REV .ne. 0) THEN 

cc REV .ne. O means must do bit-reversal wavhdoniac of output 
call bitrev(a,ZVAL(M) ,n) 7 
ENDIF 


RETURN 
END 
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difstep.sss: do one stage of fft butterflies. 
DIF = Decimation in Frequency, radix=-2, inplace, l-dimension 
(C) Copyright 1989 INTEL Corporation. 
Inner loop developed with assistance from Tricord Systems, Inc. 


5/18/89: 1 pm = offset_2 added, as next-to-last stage was slow 

5/19/89: 4 pm = fetch8() routine added, for cache miss avoidance. 
5/51/89: am = use fst.q (15% perf improvement of inner_loop!) 
last_bfly added, for performance. 

6/02/89: am = bptr deleted. Modulo=address W (5% perf =mproved) 


Do one entire stage (n/2 butterflies). Sample invocation: 
call difstep(a,w,groups,offset,wincr,wlimit) 


A= scusies array of input, Single-prec float 
(complex stored as 4byte real, 4byte imag eontisucusiy): 
pointer to array of twiddle factors. AsSuming W(k) is 
CMPLX (cos (2pi*k/N)),-sin(2pi*k/N)) for k=0 to (N/2)<1. 
offset = distance (except for scale=-by=-8byte a cae between 
the 2 input values for each butterfly. 
Offset also is the number of butterflies done per "group". 
groups = N/(2*offset). The number of sub=-DFTs this stage is split into. 
wincr = distance (except for Scale=-by-8byte sizeof(complex)) between 
successive w values for successive butterflies | 
wlimit =max index, in bytes, of W table. 


// Outputs: : | 
// A= complex radix-2 butterflied version of input. ~ 
| panna - | 
define(astart, rl6) //input data base address 
define(wstart,rl7) //twiddle array ptr. Because w-contents dépena: on N, 
// we will assume the caller has initialized w() array. 
define(groups,rl8) //groups=number of sub=-DFTs this stage is split into. 
define(offset,rl9) //offset (initially elements, mult by 8 to get bytes) 
// between node and its dual (the 2 numbers to butterfly, ie. A and B) 
define(wincr,r20) //increment between succesSive W values. Remains constant 
// within a given stage. For Decimation in Freq, wincr addressing is: 

// +8 for offset=N/2 (WO,W1,W2,WS5,...W(n-1L) ) ; 

// +16 offset=N/4 (WO, We, W4, .«e- ) etc... 
define(wlimit,r21) //max index, in bytes, of W table. 
define(wind,r22) //current index, in bytes, of W table. 
define(offset2,r235) //offset*2 


define (decrem,r24) //bla decrement 
define (Somecount,r25) // bla counter 


define(FEtch, r26) //pointer to lst component of butterfly (load) 
define(STore,r27) // " " lst component of butterfly (store) 
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// f4:f7 spare 

define(AR, fl12) //element A, real component 

define(AI, £13) // " ", imag 

define(ARo,fl4) // extra A value, for prefetch (o="odd") 
define (AIo, £15) 

define(BR, £16) //element B, real component 

define(BI, f17) 

define(BRo,fl8) // extra B value, for prefetch 

define (Blo, fl19) 


define(ER, f20) //A+B, real (ER = AR + BR) 
define(EI, f21) // " imag " 

define (ERo,f22) //A+B, real, previous loop's value 
define(EIo,f23) // " imag " 

define(FR, f24) //W*(A=-B), real 

define(FI, f25) // " imag " 

define (FRo,f26) 

define (FIo,f27) 


define(DR, f28) //Difference of A=-B, real part 

define(DI, £29) // " ", imag " 

define(WR, £30) //W (twiddle factor), real part 

define(WI, f31) // " " , imag 
define(WRo,fl0) //W (twiddle factor), real part (EXTRA copy) 
define(WIo,fll) // "" , imag 


etext 

ealign .quad 

_difstep_:: 

ld.l O(groups),groups //fix Fortran call-by-ref 

ld.l O(offset),offset // : 
shl 3,offset,offset // change from elements to bytes 
shl l,offset,offset2 


fst.q f8 ,-16(Sp)++ //save "local" regs 
fst.q £12,-16(Sp)++ // " " 


adds -l,groups,groups // pre-decrement for bne usage, or bla usage 
adds ~16,r0,decrem //bla decrement . 


// We code the last 2 stages as Special cases; 
| |------- - | 
xor 8,offset,rO //offset=1, special case, no complex mult, funny addressing 
beoffset_l// (ASSUMING offset=1 means wincr=0, and no twiddle used) 
xor 16,offset,rO //offset=2, special case, no complex mult, funny addressing 
beoffset_2// (ASSUMING offset=2 means wincr=N/4) 

O(wincr) ,wincr 

O(wlimit) ,wlimit 
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pfadd.ss f0,f0,f0- 

pfadd.ss f0,f0,f0 

pfadd.ss f0,f0,f0 // init Al1,A2,A3=0 

pfmul.ss f0,f0,f0 : 

pfmul.ss f0,f0,f0 

pfmul.ss f0,f0,f0 

| /------- ~ 

// init pointers; 

shl 3,winer,wincr //secale for bytes. 
shl l,winer,wind /f/init wind =2*wincr 


pfld.d O ( wstart) ,fO 

pfld.d wincr ( wstart) ,f0O 

adds -8,astart,FEtch 

pfld.d wind (wstart) ,f0 

adds wincr,wind,wind //wind now 3*wincr 

// here fetch first set of A,B,W before bla-loop 

pfld.d wind (wstart) ,WR 

adds wincr,wind,wind 

and wlimit,wind,wind //modulo-wlimit the w index 

// We do modulo-addressing on W(), to keep the pfld papeLing full. We 
// never do a W-fetch beyond the end of the table. . 
// And the modulo-check needs to be done only every 4th pfld, as always 
// we use a multiple of 4 W() factors. 


fld.d 8 (FEtch)++,AR 
fld.d offset (FEtch) ,BR : 
d.rgapl.ss f0,f0,f0 //clear Treg. 
adds -32,offset,somecount // bla counter (predecrement by 4 elements) 
// | 
// Definitions for pipe diagram: 
(the complex multiply product, F, peewee into 4 real mult and 2 adds) : 
WR cos(), WI=-sin(). 
DR = AR =. BR; (diffence of Real components of A ,B). 
DI AI ~ BI; (diffence of Imag components) 
ER = AR + BR; EI = AI + BI; 
FR = K = L; where K= WR*DR, L=WI*DI 
FI = N + M; where M= WI*DR, N=WR*DI 


// For lst time thru inner_loop, don't have correct values to Store. 
// Must do 1 loop before the loop, sans the stores. 


first_ bflys:: //fill pipe | a . 
// KR. KI eee Mlee s oM2e000M3 Pini Ale e's eAQ eee. ASss oe Write 

d.r2pt.ss WR,f0,f0 // WRO -. | 

pfld.d wind (wstart) ,WRo 
d.pfsub.ss AR,BR,fO // 

fid.d 8 (FEtch)++,ARo 

d.ratls2.ss AI,BI,fO // 

fld.d offset (FEtch) ,BRo 

d.izgst.ss WI,f0,f0 // WIO 

adds wincr,wind,wind 
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d.ratlp2.ss AR,BR,DR // 

nop 
d.ialp2.ss AI,BI,DI // 

pfld.d wind (wstart) ,WR 
d.r2pt.ss WRo,DI,f0 // WRl - 
fld.d 8 (FEtch)++,AR 
d.pfsub.ss ARo,BRo,ER // 

fld.d offset (FEtch) ,BR 
d.ratis2.ss AIo,BIo,EI // 

adds wincr,wind,wind 
d.i2gst.ss WIo,DR,f0 T/ WIl 
and wlimit ,wind,wind 


quickstart:: 
d.ratlp2.ss ARo,BRo,DR // Kl MO - NO ER1L FRO ODIl DRL 
bla decrem,somecount,inner_loop //init LCC 
d.ialp2.ss AIo,BIo,DI // Ll Kl MO NO EIL ER1 FRO DIL 
adds -16,astart,STore // ptrs init 16 low, for fst.q instructions 
| |------------------ ~ | | 
Each butterfly = 1 complx multiply, 1 complx add, 1 complx Subtract 
= multiply, 
add 
subtract 
8-byte fetches (A, B, W) 
8-byte stores (A, B) 


6 cycles per butterfly 


inner_loop: iterates "offset/2" times (eg, N/4 for stage l, N/8& for stage2), 
for each group. It does 2 butterflies per iteration 


inner_loop:: 
// KR..-KI...M1l...M2..MS Al..A2...A5..Write 
/1 | | | — | | I ate Seb 
d.r2pt.ss WR,DI,FR // WR2 - Nl Ll Kl N+M EI1l ER1 
pfld.d wind (wStart) ,WRo 
d.pfsub.ss AR,BR,ERo // Nl Ll Kl DR2 FIO EI1l 
fld.d 8 (FEtch)++,ARo . 
d.ratls2.ss AI,BI,EIo // Nl Ll DI2 DR2 FIO 
fld.d offset (FEtch) ,BRo 
d.i2st.ss WI,DR,FI // WI2 Ml - N1 K-L DI2 DR2 FIO 
fst.q ER,16(STore)++ //update ER/EI/ERo/EIo | 
d.ratlp2.ss AR,BR,DR // K2 M1 ER2 FR1L DI2 #£DR2 
adds wincr,wind, wind . - 
d.ialp2.ss AI,BI,DI // L2 K2 M1 EI2 ER2 FRL DI2 
//no need for modulo-check ("and") here, as odd num of W's have been fetched. 
pfld.d wind (wstart) ,WR 


[[vcvcccccccccevccccccsccccceneccsecccr cece esv cree eeee cece ceeseeeens 
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// KReeeKlieweM bcs ccMeceeeMS T Als cesh2@sixcASecaec Write 


d.r2pt.ss WRo,DI,FRo // WR3 = N2 L2 K2 Nl N+M EI2 ER2 FR1 
adds wincr,wind,wind 

d.pfsub.ss ARo,BRo,ER// N2 L2 K2 Nl DRS FIl EI2 ER2 
fld.d 8 (FEtch)++,AR Ce 
d.ratls2.ss AIo,BIo,EI// - N2 L2 K2 DI3 DRS FIl EI2 
fld.d offset (FEtch) ,BR 


d.i2st.ss WIo,DR,FIo// WI3 M2 - N2 K2 K=L DI3 DR3 FIl 
fst.q FR, offset (STore) 
//update FR/FI/FRo/FIo 


d.ratlp2.ss ARo,BRo,DR// K3S #£=Me2 ~ N2 ERS FR2 DI3 DR3 

bla decrem,somecount, inner_loop | | 
d.ialp2.ss AIo,BIo,DI// L3 KS M2 N2 EI3 ERS FR2 DI3 
and wlimit,wind,wind //modulo. 


end_inner_loop:: //KEEP Pipelines full 
// RE-init pointers for. fetches 
d.fiadd.ss f0,f0,f0 
adds offset2, astart, astart “7 / bump to next group 
//redo A,B fetches, with proper ptr. 
d.fiadd.ss f0,f0,f0 
fld.d O(astart) ,AR //get first sro in next group 
d.fiadd.ss f0,f0,f0 
fld.d offset (astart) ,BR 
d.fiadd.ss f0,f0,f0 
adds O,astart,FEtch . 
last_bfly:: //do final 2 butterflies, start next group 
If KRees Klee MliceesNGcnccNS.. 7 AlewsahlieecASsaccWritve 


d.r2pt.ss WR,DI,FR // WR4 - N3 L3 K3 N2 N+M EI3 ERS FR2 
pfld.d wind (wstart) ,WRo 
d.pfsub.ss AR,BR,ERo // NS L3 K3 N2 DR4 FI2 EIS ERS 
fld.d 8 (FEtch)++,ARo 
d.ratls2.ss AI,BI,EIo// - NS L3  K3 DI4 DR4 FI2 EIS 
fld.d offset (FEtch) ,BRo es 
d.igst.ss WI,DR,FI // WI4 M3 - N3 KS K-L DI4 DR4 FI2 
fst.q ER,16(STore) ++ . . 
d.ratlp2.ss AR,BR,DR // K4 MS - N3 ER4 FRS3 DI4 DR4 
adds wincr,wind,wind ; 
~d.ialp2.ss AI,BI,DI // L4 K4 £MS N3.. EI4 ER4 FR3 DI4 


pfld.d wind (wstart) ,WR 
[[avccccvaccccecccrcccccccccescccccccccee see eeseeeene ses eeseseeesees | 
| a JP KReaweKl ss cMlis6 eo Meese MS T  Al....A2....A3....Write 

d. r2pt. SS WRo,DI,FRo // WR5 = N4 “L4 K4 NS N+M EI4 #£zER4 FR3 
fld.d 8 (FEtch)++,AR : . 
d.pfsub.ss ARo,BRo,ER// N4 4 K4 NS DR5 FI3 EI4 #£ER4 
adds =-32,offset,somecount // reset bla counter 

d.ratls2.ss AIo,BIo,EI// - N4 L4 kK4 DI5 DR5 FI3 E14 
adds. wincr,wind,wind : 
d.i2st.ss WIo,DR,FIo// WI5 M4 - N4 K4 XK-L_ ODI5 DR5 FI3S 
adds -1l,groupsS,groups 
d.fnop 

fld.d offset (FEtch) ,BR 
d.fnop 

bne.t quickstart //eranch on value of groups 

d.fnop 

fst.q FR, offset (STore) 
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end_last_bfly:: 
d.fnop 

br endit 
fiadd.ss f0,f0,f0 

fst.q FR, offset (STore) //repeated for bnc.t untaken case 


offset_l:: 
// want FEtch=0,2,4,6,8,... elements. ASSUMING wincr=0, 
and that w=(1,0), So that no complex mult needed, and NO W will be fetched. 
E=A+B, F=A-B. (Per double-butterfly loop: 8 pfadd,4 dword fld, 4 fst, 
1 bla) (fld.q required, to reduce # flds to pvore pipe stalls) 
Performance = 4 cyc/bfly best case. 


//Redefine regs for fld.q,fst.q usage, when A and B adjacent: 
define(AR3,f12) //element A, real component 

define(AI3,f13) // " ", imag 

define(BR3,f14) //element B, real component 

define (BI3,f15) 

define(AR4,f16) // extra A value, for prefetch 

define (AI4,f17) 

define(BR4,f18) // extra A value, for prefetch 

define (BI4,f19) 


define(ER3, £20) //A+B, real (ER = AR + BR) 
define(EI35, f21) // " imag " 

define(FR3, £22) //(A=-B), real 

define (FI3, B20) | imag " 


define (ER4,f24) //A+B, real, extra copy 
define (EI4,f25) // " imag 


define (FR4, £26) | | | | 
define (FI4, £27) | | : 


-16,asStart,FEtch 
fld.q 16 (FEtch)++,AR4 
adds -l,groups,Somecount // bla counter Rowedoexeuentca: already by 1) 
//using groups=blacount on the offset_l loop, SLE n LON 

adds ~16,FEtch,STore 
//startup the loop: 

// Lh BlewccccAQeccevcAScceces Writes: 
d.pfadd.ss AR4,BR4,f0 // ARn+BRn - - -— 

fld.q 16 (FEtch)++,AR3 : 
d.pfadd.ss AI4,BI4,f0 // AIn+BIn ERn - 

adds -2,r0,decrem //2 bflies per loop 
d.pfsub.ss AR4,BR4,f0 // ARn=-BRn EIn ERn 

bla decrem,Somecount, offsetl_loop //init LCC 

d.pfsub.ss AI4, Br4, ER4 // AIn=-BIn FRn EIn_ oRBROat 


// Alec ccccAQecccccA3cccee Write: 
offsetl_loop:: 
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d.pfadd.ss AR3,BR3,EI4 // AR+BR  FI- FR- EI- 
nop j 
d.pfadd.ss AI3,BI3,FR4 // AI+BI ER FI-— FR- 
fld.q 16 (FEtch)++,AR4 | 
d.pfsub.ss AR3,BR3,FI4 // AR=BR- EI ER . FI- 
fst.q ER4,16(STore) ++ 
d.pfsub.ss AI3,BI3,ER3 // AI-BI FR EI ~~ ER 
nop | 
d.pfadd.ss AR4, BR4,EI3 // AR2+BR2 FI FR EI 
fld.q 16 (FEtch)++,AR3 | : ; | 
d.pfadd.ss AI4,BI4,FR3 // AI2+BI2 ER2- _ FI FR 
nop | a 
d.pfsub.ss AR4,BR4, FI3 // AR2-BR2 EI2 © ER2. +#&«»*F'I 
bla decrem,Somecount, offsetl_loop 
d.pfsub.ss AI4,BI4,ER4 // AI2=-BI2 FR2 EI2 ERnext 
fst.q ER3,16(STore) ++ : 
| [nnone nnn rene nnn nner nnn 


end_offsetl_loop:: 
d.fiadd.ss f0,f0,f0 
br endit 

fiadd.ss f0,f0,f0 


-align squac 

offset_2: 

// want FEtch=0, 134,538,9;12;13;... elements. aa 7 
// ASSUMING wincr=N/4  (W addr=0,N/4,0,N/4,0,...). Trivial W() factors. 
// USE bla loop, incrementing FEtch by 16 (2*offset). 

// .Even-indexed elements identical to offset_1,W=WO, no complex mult 
// + So FReven=(AR=-BR), Fleven=(AI-BI). et 

// Odd components have W=(0,-1). So FRodd=(AI-BI) , Flodd=(BR-AR) . 

// Each fld.q fetches AReven,AIeven,ARodd,AlIodd. 


//Assume ER,EI,ERo,EIo are 4 contiguous regs. 
//Assume FR,FI,FRo,FIo are 4 contiguous regs. 


adds -16,astart, FEtch 

fld.q 16 (FEtch)++,AR . 

fld.q 16 (FEtch)++,BR | 

adds 0,groups,sSomecount /poie counter 


//startup the loop: | 
// ee ee ee eee ee // Ale cceecA2seeeeshSeeeessWrite s 


pfadd.ss AR ,BR ,f0 // AR+BRe | tat cee 
_pfadd.ss AI ,BI ,f0 // AI+BIe ER 7 - 


d.pfadd.ss ARo,BRo,f0 // ARo+BRo EI ER . _- 

nop i G. gotte 2 A 
d.pfadd.ss AIo,BIo,ER // ATo+Blo BRO EI. ER 

nop : ey ae 

d.pfsub.ss AR ,BR ,EI // AR-BRe EIo. ERo EI . | 

adds -l1,r0,decrem //2 bflies per loop,but groups ‘is halt désined value. 
d.pfsub.ss AI ,BI ,ERo // AI=-BIe FR EIo ERo | ; a 
adds -16,astart,STore 

d.pfsub.ss AIo,BIo,EIo. // AIo=-BIo FI . FR EIo 

bla decrem,somecount, offset2_loop //init LCC 

d.pfsub.ssS BRo,ARo,FR // BRo=-ARo FRo | FI FR 

nop 
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offset2_loop:: 
d.fnop 
fld.q 16 (FEtch)++,AR ipteten . AR,AI,ARo,Alo 
d.fnop 
fld.q 16 (FEtch)++,BR //fetch BR, BI,BRo , BIo 
// Id Bless sASeivecsASeces Writes 
d.pfadd.ss AR ,BR ,FI // AR+BRe FIo | FRo FI 


nop 
d.pfadd.ss AI ,BI ,FRo // AI+BIe ER FIo FRo 
nop 
d.pfadd.ss ARo,BRo,FIo // ARo+BRo EI ER FIo 
fst.q ER ,16(STore) ++ ra ; 
//update ER ,EI ,ERo,EIo 
d.pfadd.ss Alo,BIo,ER // AIo+BIo ERo . ER 
nop 
d.pfsub.ss AR ,BR ,EI // AR=BRe_ EIo : EI 
nop 
d.pfsub.ss AI ,BI ,ERo // AI=-Ble FR 

fst.q FR ,16(STore)++ 
d.pfsub.ss AIo,BIo,EIo // AIo-BIo FI 

bla decrem,Somecount,offset2_loop 
d.pfsub.ss BRo,ARo,FR // BRo=-ARo FRo 

nop 


endits:: 
// restore regs 
fiadd.ss f0,f0,f0 //exit DIM 

fld.q O(Sp),f12 : . 
fiadd.ss f£0,f0,f0 //last DIM aie 
fld.q 16(sp),f8 
adds 32,Sp,Sp 

bri rl 


2-411 


intel. ' ie PRELIMINARY 


c difstepf.f: do one Stage of fft (DIF) butterflies 
ec (C) Copyright 1989 INTEL Corporation. ALL RIGHTS RESERVED. 


c Decimation in Freq, radix-2, aREaes 1-dimen 
c 6/20/89 


c Do one entire stage (n/2 butterflies). Sample invocation: 
call difstep(a,w,groups,offset,wincr) 4 


ir) 


Inputs: 

A= complex array of input, Single-prec float 
(complex stored as 4byte real, 4byte imag seetieicuiy 

W= pointer to array of twiddle factors. Assuming W(k) is 
CMPLX (cos (2pi*k/N)),-sin(2pi*k/N)) for =O to ere ~l. 

offset = distance (in "elements") between — 
the 2 input values for each butterfly 

groups = number of sub=-DFTs this stage is split into. 
(groups*offset*2 = N) 

wincr = distance between successive w values for successive Buttertites: 


Outputs: 
A= complex butterflied version of input. 


oe oe oe > Oe > oO Oe > oO oO OS) 


- SUBROUTINE difstep(a,w,groups ,offset,wincr) 
integer groups,offset,wincr 
integer i,j,indexl,iplus 
complex a(groups*offset*2) ,w(groups*offset) ,wtemp, temp 


c We implement a... 
c Special case for offset= L(last stage) : no complex sa Sea simple add 
c¢ (Performance enhancement ) 
IF (offset .eq. 1) THEN 

oe NODEPCHK , 
DO. 8 i = 1,(2*groups) ,2 

iolus- So): ah 

temp = a(iplus) 
. a(iplus) = a(i) = temp 
8 a(i) = a(i) + temp 


-C Special case for offset=2 (next-to-last stage): no complex multiplies, 
ee Simple add. (Performance enhancement) 
cc For half the butterflies, W=(1, 0). For the other half, W=(0,-1) 

IF (offset .eq. 2) THEN 


CVD$ NODEPCHK 
DO 90 i = 1, (4*groups) ,4 
iplus = i+ 2 
temp = a(iplus) 
a(iplus) = a(i) = temp 
90 a(i) = a(i) + temp 
C 2nd call to i-loop: a -1.) 
CVD$ NODEPCHK 
CVD$ NOVECTOR 


DO 92 i = 2,(4*groups), 4 
iplus =i+2 
temp = a(i) - a(iplus)- 
a(i) = a(i) + a(iplus) 
92 a(iplus) = CMPLX(AIMAG(temp) ,-REAL(temp) ) 
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ec "DO 20" indexl-loop is "outer loop" 
CVD$ VECTOR 
CVD$ NODEPCHK 
DO 20 indexl = 1, (2*offset*groups) , (2*offset) 
yeh 
CVD$ NODEPCHK 
CVD$ ALTCODE 
DO 10 i = indexl, (indexl+offset-1) 
iplus = i + offset 
temp = a(i) =- a(iplus) 
a(i) = a(i) + a(iplus) 
a(iplus) = w(j) * temp 
10 j = j + wincr 
20 CONTINUE 
ENDIF 
ENDIF 
RETURN 
END 
cecceccceccccecceccecccceccccccceece 
Subroutine fetch(a,n) 
integer n 
complex a(n) ,temp 
Kludge do-nothing prefetch. 
temp = a(l1) ; 
RETURN | 
END 
cecceccceccccececcececceccecccecccce 
Subroutine bitrev(a,dummy,n) 
C Bit-Reverse 
Inputs: 
A= complex array of input, Single-prec float 
dummy = %val(m). Probably unusable from Fortran. 
N = number of input points (and output points) 


Ouput : 
A = original A data, but in bit-reversed order from A 


integer n,i,j,k,ndiv2 


"DO 7" loop to in-place-bit-reverse-shuffle output 
j=1 | 
ndiv2 =n / 2 
DO 7 i=1, n-l 
IF (i .1t. j) THEN 
temp = a(j) 


a(j) = a(i) 
‘a(i) = temp 
ENDIF 
k = ndiv2 
"While (j .gt. k)" /*decrease j by 2**something */ 
IF (j «gt. k) THEN 
j = j-k 
k=k/2 
GOTO 6 
ENDIF 
Add next lower power of 2 to j 
j = jtk ; 
RETURN 
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// bitrev.ss 
// (C) Copyright 1989 INTEL Geiponetton. ALL RIGHTS RESERVED. 


// BlT-reversal of Sbyte array elements. 
// IN PLACE. 
// (Allows arrays of 8,16,32,64,128,256,512, or 1024 elements) 


// Invocation: (from Fortran) 
// call bitrev(a,ZVAL(m)) — 


// Inputs: 

// a= rl6 = pointer to array of 8byte elements 

// = rl7 (call by value)= base-2 log of total number of. elements 
// (2**m = N) . 

// Outputs: 

// a= Bit-reversed ordered version of A 

// 


// Expected besSt-can-do sisi PouManOS:, and measured ear boaaucs™. 
// approx 4*N clocks (0.06 mSec for 512 points) 

| |--2--------------- - | 
define(astart, rl6) //initial input data base address 
define(m, rl17) | | 

define (logN,rl17) 

define (destl,rl9) 

define (dest2,r20) 

define (dest3,r21) 

define (dest4,r22) 

define(iptr, r23) //index-array pointer 


define (decrem,r24) //bla decrement 
define(count,r25) // bla counter 


etext 
align .quad 


//fetch base address for index table (rbdsetab) 
// base-addr-table elements = (baseaddr, number_of_swaps-2) 
// base-addr-table indexed by logN. 


shl 3,logN,r30 //scale to 8-= Py ESUeRUTy length 

mov rbasetab,r29 

1ld.l r29(rs0), iptr 

addu 4,r29,r29 

ld.l r29(r30), count //number of Swaps required for this value N 


pfld.d O(iptr) ,fO //initiate fetch of first 2 bit-rev indices 
pfld.d 8(iptr)++,f0 

adds -2,r0,decrem//2 swaps per es 

pfid. d ee hoe eae fO 


‘bla  decrem,count, revloop //init LCC 
-pfld.d 8(iptr)++,f16 //get 2 indices, but don't cache the indices 
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revloop:: //2 Swaps per loop : 

TITS oye les consumed for each swap, best case. 
pfld.d 8(iptr)++,f18 //2 more indices 

fxfr f16,destl //transfer to integer index regs 
fxfr f17,dest2 ; 
fld.d destl (astart) ,f24 //fetch 2 elements to. Swap 
fld.d dest2 (astart) ,f26 

fxfr f18,dest3 

fst.d £24, dest2 (astart) 

fst.d f26, destl (astart) 

fxfr f19,dest4 

fld.d dest3 (astart) ,f28 

fld.d dest4 (astart) ,f30 

pfld.d 8(iptr)++,f16 //2 more indices 

fst.d £28, dest4 (astart) 

bla decrem,count, revloop // 

fst.d £30, dest3 (astart) 


bri rl 


// _fetch8_: Touch all 32-byte lines in the 8k data bytes, to get them 
// into dcache. (ASSUMING .lte. 8Kbytes and .gte. 4Kbytes) 


// 
// Invocation= fetch(astart,num8) 
// Inputs= 


// aStart=rl6=pointer to data which is to be touched. 
// num8=rl7 (passed by VALUE, %VAL(), not by reference) 


// Using RC and RB to improve dcache hit rates, for FFTs bigger than 

// 1024 complex (8kB). 

// RC=10 causes TEP Leeement: only of block denoted by RB isbit. RC=11 disables 
// replacement. 


define (num8,r17) 
define (FEtch, r26) 


_fetch8_:: 
awaretenu ss 

ld.c dirbase,r30 

or 0x800,r30,r50 // Replace Deache slot 0O only (RC=10,RB=00) 

Sst.c r30,dirbase - | 
// Put 4Kbytes into Dcache slot 0. (The rest after 4kB goes to Slotl). 


adds -4,r0,decrem //4 8-byte-groups per cache line 
adds 508,r0,count //512, but pre-decremented for bla usage. 
bla decrem, count, floop 
adds -52,astart,FEtch 
floop:: 
bla decrem,count,floop 


fld.d 32(FEtch)++,f£30 //dummy load. 


adds -~512,num8, count 

be... fdone //if data exhausted, quit 
// id.c dirbase,r30 

or 0x900,r50,r30 // Replace Deache Slot 1 only (RC=10,RB=01) 
Sst.c r30,dirbase 
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adds -8,count,count //predecr for bla 
bla decrem,count,floop2 //set LCC. 
fld.d 32 (FEtch) ++,f50 
floop2:: 
bla decrem,count,floop2 
fld. a 32(FEtch)++,f30 //dummy load. 
fdone: 
// uniook dgeené 
andnot OxF00,r30,r30 //clear RC,RB (dirbase(1ll: oe 
St.c r30,dirbase 


J/ rbasetab:: (Table of bit-reversed indices for bitrev Subroutine) 
// base-addr-table elements = (baseaddr, number_of_swaps-2) 
// base-addr-table indexed by logN. 
align equee 
rbasetab: 
elong [6]0 {780% t poener with log(n)=0, 1,2 
elong rev8, 0 
elong revl6, 
slong rev32, 
revé4, 
revl2s, 
rev256, 
rev512, 


//number of uganeeeas: for N=512. (ie, 32 symmetrical patterns | 
// exist between 0 and 511.) 

// rev512: array of bit-reversed indices, for N=512. 

// Each entry is ("i", and "bit=-reversed-i"), shifted left by 3 
// ‘to account for 8-byte-elements. 

// NOTE: This listing DOES NOT SHOW all the table elements, to save paper. 


ealign .quad 
rev51233: 
elong 8, 2048, 16, 1024 
'.elong 24, 3072, 32, 512 
slong 40, 2560, 48, 1536 
// ETC..., ETC...., ETC dsc 


ealign .quad 
revl0243:3 3 
elong 8, 4096, 16, 2048 
elong 24, 6144, 32, 1024 
- long 40, 5120, 48, 3072 
long 56, 7168, 64, 512 
Jf BlCiedsy: ETC caawy ElCess 
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//Number of swaps = 496 
//N (Number of elements) = 1024 


ealign .quad 
revl63:3 
elong 1*8,8*8,2*8,4*8 
elong 3*8,12*8,5*8,10*8 
elong 7*8,14*8,11*8,13*8 
rev833 
elong 1*8,4*8,3*8,6*8 


ealign .quad 
rev3a23: 


128,16, 64, 24, 192, 40, 160, 48, 96, 56, 224 


ealign .quad 

rev643: 

elong 8, 16, 
elong 24, 32, 
elong 40, 48, 
elong 56, 448, 72, 
(P] El Ceees BLCes ses ELCs ex 


ealign .quad 

revl28:; 

-long 8, 16, 256 

elong 24, 32, 128 

- long: 40, 48, 384 

elong 56, 72, 576 | 

Td BlCsees Ble sieves EBTCeee : | 
//Number of swaps = 56 (Number of elements) =128 


ealign .quad - 

rev256:: 

elong 8, 16, 512 

elong 24, 32, 256 

elong 40, 48, 768 

elong 56, 1792, 64, 128 

J] BlCvacs Bl eee04: -BICses 

//Number of Swaps = 120, N (Number of elements) = 256 
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PROGRAM FFITEST 


1-D FFT TEST PROGRAM 


Intel assumes no reSponsibility for use or misuse of this code. 


7/20/89 


character*8 REALLY 
PARAMETER (IREV=0) 
PARAMETER (REALLY='complex' ) 
PARAMETER (TIMEIT=1, CACHETIME=0) 
DATA IT/200000/ | : 
PARAMETER (N=1024, M=10) 
PARAMETER (N=512,M= 9) 
PARAMETER (N=256,M= 8) 
PARAMETER (N=128,M= 7) 
PARAMETER (N=64,M= 6) 
PARAMETER (N=32,M= 5) 
PARAMETER (N=16, M=4) 
PARAMETER (PI=3.1415926536) | 
COMPLEX X(N) ,X1(N) ,X2(N) ,X3(N), W(N/2) 
Fortran complex values stored R,I, R,I for arrays. 
Real ASQR(N) ,ASQR2(N) ,XR(N) 
complex wtemp j 
real rtemp 


PRINI *,' FFT test program (ffttest. fy wacee.* 
print *, 

IF (IREV -eq. 0) THEN 

print *,'NOT counting time for bit-reversal.' 

print *,'DO NOT expect matching answers,without bit-rev' 
ELSE 

print *, ‘Time for bit-reversal included.' 

ENDIF 


print *, 'Time for cache writeback and fills...' 
IF (CACHETIME .eq. 0) THEN 

print *,' NOT included, if iterating.' 
ELSE on, | : 

print *,' ... included.’ 
ENDIF 


print 

print ‘If iterating... Number of Iterations =',IT 
print 

print "Number of Points 

print ve REALLY,' data) ' 

print 
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C Init twiddle factor array w(k) with (cos,-sin) of 2pi*k/N 
(Should just declare this as constant, if N is non-variable) 
(OR could have one constant 512-entry W (for N=1024), adjust wincr accordingly 

in diff.f for smaller N) 
rtemp = 2.0*pi/N 
wtemp= CMPLX(cos(rtemp), -Sin(rtemp) ) 
w(l) = (1.0, 0.0) 
DO 200 k = 2,N/2 

200 w(k) = wtemp * w(k-1) 

cc print *,' W (twiddle) initialization completed......' 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

C INITIALIZE input data 

C 

PIN = (4*PI)/ N 
DO 100 I=1, N 
For testing with Sinewave input data: 
COS( I*PIN) 
SIN( I*PIN) 


For testing with Squarewave input: 
IF (I .1t. N/2) THEN 


Treal 
Timag 
ELSE 
Treal 
Timag 
ENDIF 
For testing with ramp function input data: 
Treal = I = 1.0 
a une: = Treal + 0.5 
X(I) = CMPLX (Treal, Timag) 
X1(T1) CMPLX (Treal, Timag) 
X2 (I) CMPLX (Treal, Timag) 
X3(1I) CMPLX (Treal, Timag) 
100 CONTINUE 
C 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
IF (TIMEIT .ne. 0) THEN 


CALL fft (X2, M, N) 
cc Subroutine fft is Decimation-In-Time, Fortran version. 


CALL 
CALL 
ENDIF 


ceccecccccccccceccecccccecccccccecccccce 
IF (IREV .ne. 0) THEN 
IF (TIMEIT .eq. 0) THEN 
call vcompare (X,X2,2*N) 
call cmags(X,N,ASQR) 
c cmags to take Squared magnitude of complex values 
call cmags(X2,N,ASQR2) 
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C print non-zero results: 
J=0 
DO 700 I = 1,N a: . 
IF ((ASQR(I) .GT. 1.0) .OR. (ASQR2(I) .GT. 1.0)) THEN 


WRITE (6,22) (I-1), ASQR(I), ASQR2(I) “4 
22 FORMAT (' I-l=',I4,' ASQR(I)= ',F14.2, ' ASQR2(I)= ',F14.2//) 
J = J+l | 
IF (J .GT. 32) GOTO 725 
ENDIF 
700 CONTINUE 


725 CALL TIME 
ENDIF 
ENDIF 


IF (TIMEIT .ne. 0) THEN 
ceccccccccccccceccccccecccececcecccccce 
cce- Timing loop follows: 


print *,' Start Ass.FFT' 
IF (CACHETIME .eq. 0) THEN 
DO 500 I= 1, IT,4 
C Reuse same array, so cache fill and writeback time NOT included. 
CALL diff(X, M, N,W,IREV) 
CALL diff(X, M, N,W,IREV) 
CALL diff(X, M, N,W,IREV): 
500 CALL diff(X, M, N,W,IREV) 
ELSE 
DO 504 IT =1, IT,4 
C Alternating between X,X1,X2,X3 should prewar cache misses. 
CALL diff(X, M, N,W,IREV) 
CALL diff(Xl, M, N,W,IREV) 
: CALL diff(X2, M, N,W,IREV) 
504 CALL diff(X3, M, N,W,IREV) 
ENDIF 
print *,' END Ass.. FFT' 
cececccccecceccecceccecceccceccecccececcccce 
ENDIF 
STOP 
END 
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Subroutine vcompare(res,exp,n) 
c VCOMPARE compares 2 REAL vectors, prints out lst few miScompares 
Cc 

integer n, errcnt 

real res(n), exp(n) 


write(6,12) 
format ('*** VCOMPARE:; vector comparison beginning ***') 


data errcnt/0/ 
do 30 i=l,n | 
if(AINT(res(i)) «ne. AINT(exp(i))) then 
ec {print out error, exit if alot already} 
120 print *,'*** Error in compares ***! 
write(6,121) i 
121 format(' Item number = ',I6) 
write(6,124) res(i), exp(i) 
124 format(' Res_=',F14.2,' Expected_=' ,F14.2) 
errcnt = errent + l 
if (errent .gt. 19) then 
return 
end if 
end if 
~ 30 continue 


if (errent .eq. 0) then 
190 print *,' *** vector compares SUCCESSFUL ***' 
end if 


99 return 
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C File: ditt.f 
C 6/15/89 


C Intel assumeS no reSponsibility for uSe or misuse of this code. 


C FFT - Decimation in TIME, pens inplace, 1l-dimen 

C Inputs: 
C A= complex array of input, ‘up to. 1024 pts, Single-prec float 
C M= log of number of pts 

C = (Number of stages of FFT) 

C N= number of points. ie, N= 2**M = number of pts 

C W= complex array of twiddle factors, eae N/2. 

C REV= PeuOree parameter. 

C . 

C 

C 

C 


Outputs: o 2 oe 
A= complex fft of input A. Correct order (bit-reversal done). 
GO COD ORE Cicer eC CRC RR EC RUE E I er eer coer a naene je ceuee Cer deen me muer ne 


Subroutine ditt(a,m, LN, W [REV) 
integer m,N, i, REV,wlimit 

' integer offset, stage, groups, wincr, powers2(0% 10) 
complex a(n) ,w(N/2) ,temp 


data powers2 /1,2,4,8,16,32,64,128,256,512,1024/ 
C Powers2 to avoid calls to POW, DIV 


C Twiddle factor array w(i) has (cosS,=Sin) of 2pi*i/N.. 
CC Assume the caller provides w(i), constants ALREADY initialized 


C Pre-touch data, lock into cache, for 8kByte fft: 
IF (N .gt. 515) THEN . 
call fetch(a,%VAL(n) ) 


call bitrev(a,%VAL(M) ,n) 
C Bitreversal of input needed for in-place decim in time FFT, to avoid 
C fetching twiddle-factors in bitrev order. 

wlimit = 8*((N/2) = 1) 


DO 20 stage = l,m 

groups = powers2(m=-stage) 
C groupS=number of times the twiddle factors are used, ie, the number of 
C smaller DFTs the stage is split into. 


C offset gets 1,2,4,8,...N/2 

: offset = powers2(stage-1) 

wincr = groups . 

call ditstep(a,w,groups,offset,wincr,wlimit) 
20 CONTINUE 


RETURN 
END 
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ditstep.ss: do one stage of fft butterflies 

DIT = Decimation in Time, radix-2, inplace, l1-dimension 
(C) Copyright 1989 INTEL Corporation. ALL RIGHTS RESERVED. 
7/15/89 


Do one entire stage (n/2 butterflies). Sample invocation: 
call ditstep(a,w,groups,offset,wincr,wlimit) 


Inputs; 
A= complex array of input, single-prec float 
(complex Stored as 4byte real, 4byte imag contiguously) 
W= pointer to array of twiddle factors. Assuming W(k) is 

CMPLX(cos(2pi*k/N)),-sin(2pi*k/N)) for k=0 to (N/2)-1. 
offset = distance (except for Scale-by-8byte sizeof(complex)) between 

the 2 input values for each butterfly. 

Offset also is the number of butterflies done per "“proup". 
groups = N/(2*offset). The number of sub-DFTs this stage is split into. 
wincr = distance (except for scale-by-8byte sizeof(complex)) between 

succesSive w values for Successive butterflies 
wlimit =max index, in bytes, of W table. 


Outputs: 
A= complex radix-2 butterflied version of input. 


define(astart, rl6) // input data base address 
define(wstart,rl7) //twiddle array ptr. Because w-contents depend on N, 
// we will assume the caller has initialized w() array. 
define(groups,rl8) //groupS=number of sub-DFIs this stage is split into. 
define(offset,rl19) //offset (initially elements, mult by 8 to get bytes) 
// between node and its dual (the 2 numbers to butterfly, ie. A and B) 
define(wincr,r20) //increment between successive W values. Remains constant 
// within a given stage. 

' define(wlimit,r21) //max index, in bytes, of W table. 
define(wind,r22) //current index, in bytes, of W table. 
define(offset2,r23) //offset*2 


define (decrem,r24) //bla decrement 
define(Somecount,r25) // bla counter 


define(FEtch, r26) //pointer to 1St component of butterfly (load) 
define(STore,r27) // " " 1st component of butterfly (store). 


define (offsetp8,r28) //offset+8 
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define (ARe,f12) 
define (AIe,f13) 
define (ARo, f14) 
define (AIo,f15) 
define (BRe,f16) 
define (Ble,f17) 
define (BRo,f18) 
define (BIo,f19) 


define (ERe, f20) 
define (EIe,f21) 
define (ERo, f22) 
define (EIo,f23) 


define (FRe, f24) 
define (Fle, f25) 
define (FRo, f26) 
‘define (FIo, f27) 


define(PR, f28) 
define(PI, f29) 


define (WRe,f30) - 


define (WIe,f31) 


define (WRo,fl10) 
define (WIo, fll) 


etext 

ealign .quad 
-~ditstep_:: > 
ld.l- 
ld.l 
shl 
shl 
adds 


fst.q 
fst.q 


adds 
adds 


O(groups) , 
O(offset),offset // 

3,offset,offset // change from elements to ere 
l,offset,offset2 — 

8,offset,offsetp8 
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//element A, real component 


//" ", imag 
// extra. A value, for peaceeteh (o="odd") 


//element B, real component 


// extra B value, for prefetch 


//A+(B*W), real (ER = AR + BR) 
// " imag " 
// previous loop's value 


. imag " 


//A-(B¥W), real 
4]? imag " 
// previous loop's value 
//-" imag " 


// (BW), real 
//(B*W), imag 


fc 


//W (twiddle factor), real part 
4/7" " , imag | 


//W (twiddle factor), real part 
//"" , imag 


(EXTRA copy) 


groups //tix Fortran eee ees 


£8 ,-16(sp)++ //save "local" regs 
£12,-16(Sp)++ // " " 


-1,groups,groups // pre-decrement for bne usage, or bla usage 
-16,r0,decrem //bla decrement 


// We code the last 2 stages as Special cases: 


8,offset,rO //offset=l, Special case, no complex mult, funny addressing 
offset_l1// (ASSUMING offset=l means wincr=0, and no twiddle used) 
16,offset,rO //offset=2, Special case, no complex mult 

offset_2 


O(wincr) ,wincr 
O(wlimit) ,wlimit 
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pfadd.ss 
pfadd.ss 
pfadd.ss 
pfmul.ss 
pfmul.ss 
pfmul.ss 


f0,f0,f0 
f0,f0,f0 
f0,f0,f0 
f0,f0,f0 
f0,f0,f0 
f0,f0,f0 
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// init Al,A2,A3=0 


// init pointers: 
shl 3,wincr,wincr //scale for bytes. 
shl l,wincr,wind //init wind =2*wincr 


pfld.d O ( wstart),?0 

pfld.d wincr ( wstart) ,f0O 

adds -8,astart,FEtch 

pfld.d wind (wstart) ,f0 

adds wincr,wind,wind //wind now 3*wincr 

// here fetch first set of B,W before bla-loop 
pfld.d wind (wstart) ,WRe 

adds wincr,wind,wind 

//first Bfretch from offset, then lst afetch from 0. 
fld.d offsetp8 (FEtch),BRe //first B value 


and -wlimit,wind,wind //modulo-wlimit the w index 

// We do modulo-addressing on W(), to keep the pfld pipeline full. We 
// never do a W-fetch beyond the end of the table. 

// And the modulo-check needs to be done aa ie every 4th pfld, as always 
// we use a multiple of 4 W() factors. 


ad.r2apl.ss f0,f0,?70 //clear Treg. 
adds -32,offset,somecount // bla counter (eel ene nen by 4 elements) 
// 
// Definitions for pipe diagram: 
Anew = E = A+(B*W) 
Bnew = F = A=-(B*W) 
Let P=(B*W). 


cos(), WI=-sin(). 

K = L; where K= WR*BR, L=WI*BI 
N + Ms: where N= WI*BR, M=WR*BI 
AR + PR (Overwrites AR) 

AI + PI ( . AT) 

AR = PR ( a BR) 

AI ~ PI ( ° BI) 


// For lst time thru inner--loop, don't have correct values to store. 
// Must do 1 loop before the loop, Sans the stores. 


| panama - 
first_bfly:: 


//fi1l pipe 
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// KR. e eKI.e eMle ee -M2e- 2 MS T Mic eho ech Sess Write 


d.r2pt.ss WRe,f0,f0 // WRe - - - - - - - - Gat, 
pfld.d wind (wstart) ,WRo 7 m 
d.i2st.ss WIe,f0,f0 // Wle 
adds wincr,wind, wind . | 
d.r2apl.ss f0O ,BRe,f0O // KO - - ~ - - = —s 
fld.d 8 (FEtch)++,ARe //first A value 
d.pfmul.ss WlIe,BlIe,f0O // — LO Ko - = = - -. 
pfld.d wind (wstart) ,WRe 3 | 
d.r2pt.ss WRo,BIe,f0 // WRo MO LO  #:-KO | -. = - = - 
fld.d offsetp8 (FEtch) ,BRo | 
d.ratls2.ss f0 ,PR ,f0// - MO LO KO = : =. - ie! 
adds wincr,wind,wind . - . 
d.i2st.ss WIo,BRe,f0O // WIo NO - MO KO K=-LO = - - 
nop. | 
[focccveserercereesececsrcenerennesee cere esereseneesseeceesene ' 
d.r2apl.ss fO ,BRo,f0O // Kl NO .-= MO’ = PRO» 
and wlimit,wind,wind ; SoZ 
d.pfsub.ss fO ,PI ,f0 // Kil NO - MO = = PRO 
fld.d 8 (FEtch)++,ARo b as = ys : : te 
d.pfadd.ss ARe,PR ,PR // Kl NO. .=— °. MO ERO . = - - PRO. 
fld.d offsetp8 (FEtch) ,BRe : 
d.pfmul.ss WIo,BIo,f0 MW Ll Kl ~=—=NO Mo ERO - ae - 
nop : a ae ; 
d.r2pt.ss WRe,BIo,f0O // WRe ~ ML. Ll °° Kl. MO M+NO ERO - - 
bla decrem,somecount,restart //init LCC . ee 
d.ratls2.ss ARe,PR ,f0// - Ml Ll =: Kil. FRO PIO - ERO - 
nop | 
restart :: . . ; = 
~d.i2st.ss Wle,BRo,ERe// - WIe NL - = «MI Kl K=-L1l FRO PIO ERO 
adds -16,astart,STore // ptrs init 16 low, oF fst.q nueLyaR sens 
| [o----------------- - 
// Each butterfly = 1 complx multiply, 1 complx add, l complx. subtract 
//= 4 multiply, 3 add, 3 subtract 
// 3 8=byte fetches (A, B, W) 
// 2 8=-byte stores (A, B) 
// | : 
// 7 cycles per butterfly 
// 


// inner_loop: iterates "offset/2" iiges 
// for each group. It does 2 butterflies per iteration 


// AR/AI fetches need to be a cycle behind BR/BI fetches here. So we 
// must index with offset+8 into B. | a 
// AR is used 1/2 loop before AI. 

// Pattern= AIO,AR1,BR2,BI2;AIl, AR2, BR3, BIS. 


inner_loop:: VW KR. ..KI..ML. ss 0M2. 0. oM3  T Al....A2....A 3....Write 


d.r2apl.ss Ale,BRe,PI // | K2 Nl - Ml -EIO- PRL FRO ° PIO 
pfld.d wind (wstart) ,WRo - — ; be 

d.pfsub.ss AIe,PI ,FRe// —  K2 Nl - Ml FIO EIO- PRL FRO 
fld.d 8(FEtch)++,ARe | 

d.pfadd.ss ARo,PR ,PR // K2 Nl - Ml ER1 FIO EIO PRL 
fld.d offsetp8 (FEtch) ,BRo 

d.pfmul.ss WIe,Ble,f0O // L2 K2 Nl Ml ER1 FIO SEIO - 


adds wincr,wind,wind 
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d.r2pt.ss WRo,Ble,EIe // WRo M2 L2 K2 M+N1l ER1 FIO EIO 
pfld.d wind (wstart) ,WRe 

d.ratls2.ss ARo,PR ,FlIe// - ~ M2 L2 K2 FR1 PIl ER1 FIO 
adds wincr,wind,wind | 

d.i2st.ss WlIo,BRe,ERo// WIo N2 - M2 K2 K-L2 FR1L PLL ERL 


and wlimit,wind,wind //modulo. 
// KR...KI...Ml....M2..~-M3 T Al... -A2Z..--ASe-- Write 


d.r2gapl.ss AIo,BRo,PI // K3 N2 - M2 EI1l PR2 FR1 PIl 
nop 

d.pfsub.ss AIo,PI ,FRo// K3 N2 - M2 FI1l EIl PR2 FR1 
fld.d 8 (FEtch)++,ARo | 

d.pfadd.ss ARe,PR ,PR // K3 N2 - M2 ER2 FIl EIl PR2 
fld.d offsetp8 (FEtch) ,BRe 

d.pfmul.ss WIo,BIo,f0 // L3 K3 N2 M2 ER2 FIL. EIl - 
nop 7 

d.répt.ss WRe,BIo,EIo // WRe M3 L3 K3 M+N2 ER2 FIl EIl 
fst.q ERe,16(STore)++ //update ERe/EIe/ERo/EIo 

d.ratls2.ss ARe,PR ,FIo// - M3 L3 KS FR2 PI2 ER2 FIil 
bla decrem,Somecount, inner_loop 

d.i2st.ss WlIe,BRo,ERe// ' Wile N3. - M3 K3 K-43 FR2 #£PI2 ER2 


fst.q FRe, offset (STore) 
HED. FRe/Fle/FRo/Flo 


saa taneicleups: : //KEEP Piedad: full 
// RE-init pointers for fetches 
d.fiadd.ss f0,f0,f0 

adds offset2,astart,astart //bump to next group 

//redo A,B fetches, with proper ptr. 

d.fiadd.ss f0,f0,f0 . 

fld.d offset (astart),BRe //get first BR/BI in next group 
d.fiadd.ss f0,f0,f0 

adds ~8,astart,FEtch \ 


last_bfly:: //do final 2 butterflies, start next group _ oes ae 
// RRiscKl ecu Mises Meindl T Al.ee eAZe ee eASe ee Write ; 


d.rgapl.ss AIe,BRe,PI // KO N3 - MS EI2 PRS FR2 PI2 
pfld.d wind (wstart) ,WRo . 

d.pfsub.ss AIe,PI ,FRe// KO N3 ~ M3 FI2 EI2 PRS FR2 
fld.d 8(FEtch)++,ARe 

d.pfadd.ss ARo,PR ,PR // KO NS = M3. ERS FI2 E12 PRS 
fld.d offsetp8& (FEtch) ,BRo 

d.pfmul.ss WIe,Ble,f0O // LO ~=6 KO N3 M3 ERS FI2 E12 = 
adds wincr,wind,wind 

d.r2pt.ss WRo,Ble,EIe // WRo MO LO KO M+N3 ERS FI2 EI2> 
pfld.d wind (wstart),WRe . 

d.ratls2.ss ARo,PR ,FlIe// - MO LO KO FRS PIS ERS FI2 
adds wincr,wind,wind 

d.i2gst.ss WIo,BRe,ERo// WIo NO - . MO KO K=-LO FRS PIS ERS 
and wlimit,wind,wind //modulo , | | 

IP EEECER TCU CEEOL EEE EE CEE CEU RCC CUL TLE LEE eee 

d.rZapl.ss AIo,BRo,PI // , Kl NO. = MO EIS PRO  FR3 PI3 
adds -32,offset,somecount // reset bla counter 


d.pfsub.ss AIo,PI ,FRo// Kl NO - MO FI3 EI3: PRO FR3 
fld.d 8 (FEtch)++,ARo pier 


2-427 


intel. Gua” : AP-435 PRELIMINARY 


d.pfadd.ss ARe,PR ,PR // Kl NO = MO ERO FIS k&I3 PRO 
fld.d offsetp8 (FEtch) ,BRe a : 
d.pfmul.ss WIo,BIo,f0 // Ll Kl NO MO ERO FI3.EI3. - 
bla decrem,Somecount,nowhere //re-init LCC=1 a : 
d.r2pt.ss WRe,BIo,EIo // WRe . Ml Ll =. Kl M+NO ERO FI3 EI3 


adds -1,groups,groups 
nowhere: ; 


d.ratls2.ss ARe, PR »FIo// - - Ml Ol Kl FRO PIO ERO FI3 
fst.q ERe, 16 (STore) ++ | Ves 

d.fnop 
bne.t restart /foranch on value of groups 

d.fnop 


fst.q-. FRe, offset (STore) 


end_last_bfly:: .. 
d.fnop 

br endit 
fiadd.ss f0,f0, £0 ee 

fst.q FRe,. offset (STore) //repeated for bnc.t untaken case 
ealign -quad es 


// want eee eee elements. ASSUMING winecr=0, 

// and that w=(1,0), So that no complex mult needed. 

// E=A+B, FHA-B. (Per double-butterfly loop: 8 pfadd,4 dword fld, 4 fst, 
// 1 bla) (fld.q used to reduce # flds) : ae 

// Performance = 4 cyc/bfly best case. 


//Redefine regs for fld.q,fst.q ‘usage, ‘ghoul A and B caaiacent.: 
Gefine(ARS,f1l2) //element A, real component 
define(AIS,f15) // " ", imag; 7 Ma 


define(BR3,f14) //element B, real component 
define (BI3,f15) | 

define (AR4,f16) // extra A value, for prefetch 
Gefine(AI4,f17) = =... .. ea We . 
define (BR4,f18) 

define (BI4, £19) 


define (ERS, £20) //A4B, real (ER = AR + BR) 
define(EI3, f21) // " imag " 
define(FR3, f22) //(A-B), real 
define(FI3, £23) // ".. imag 


define (ER4,f24) //A+B,. real 
define(EI4,f25) // " imag 
define (FR4,f26) //(A=B), real 
define (FI4, £27) // " imag 


adds -16,astart, FEtch 
fld.q 16 (FEtch)++,AR4 
adds ' «1,groups,somecount // bla counter (predecremented already by L) 
//using groups=blacount on the offset_1l loop, intentionally. 
adds » ~16,FEtch,STore 
//startup the loop: 
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J1 AleweeeeABe eevee cASeeseesWrite: 
d.pfadd.ss AR4,BR4,f0 // ARn+BRn = - - 
fld.q 16 (FEtch)++,AR3_ 
d.pfadd.ss AI4,BI4,f0 // AIn+BIn ERn - 
adds -~2,r0,decrem //2 bflies per loop 
d.pfsub.ss AR4,BR4,f0 // ARn-BRn EIn ERn 
bla decrem,Somecount, offsetl_loop //init LCC 
d.pfsub.ss AI4,BI4,ER4 // AIn-BIn FRn EIn ERnext 


// Aleoees-AZ....--AS.....-Write: 
offsetl_loop:: 
d.pfadd.ss AR3,BR3,EI4 // AR+BR FI- FR=- EI- 
nop 
d.pfadd.ss AI5,BI3,FR4 // AI+BI ER FI- FR- 
fld.q 16 (FEtch)++,AR4 
d.pfsub.ss AR3,BR3,FI4 // AR=BR- EI ER FI- 
fst.q ER4,16(STore) ++ ; 
d.pfsub.ss AI3,BI3,ER3 // AI-BI FR EI ER 
nop. 
d.pfadd.ss AR4,BR4,EI3 // AR2+BR2 FI FR EI 
fld.q 16 (FEtch)++,AR3 
d.pfadd.ss AI4,BI4,FR3 // AI2+BI2 ER2 FR 
nop : 
d.pfsub.ss AR4,BR4,FI3 // AR2-BR2 EI2 FI 
bla decrem,Somecount, offsetl_loop 
d.pfsub.ss AI4,BI4,ER4 // AI2=-BI2 FR2 ERnext 
fst.q ER3,16(STore)++ 
| |-------- - 
end_offsetl_loop:: 
d.fiadd.ss f0, f0, £0 
br endit 
fiadd.ss f0,f0,f0 
nop 
| |-------- - 
ealign .quad 
offset_2::; 
// want FEtch=0,1;4,53;8,93;12,13;... elements. 
// ASSUMING wincr=N/4 (W addr=0,N/4,0,N/4,0,...). Trivial W() factors. 
// Even-indexed elements identical to offset_1,W=WO, no complex mult. 
// 50 EReven=(AR+BR), Eleven=(AI+BI). 
// So FReven=(AR-BR), Fleven=(AI-BI). 


O,-1). So B*W = (BI,-BR). 


// Odd components have W=( 

// So ERodd=Re (A+(B*W)) = (AR+BI). EIodd=(AI-BR). 
/// So FRodd=Re(A=-(B*W)) = (AR=-BI) FIodd=(AI+BR). 
// Each flid.q fetches AReven,AIeven,ARodd,AIodd. 


//Assume ERe,EIe,ERo,EIo are 4 contiguous regs. 
//Assume FRe,FlIe,FRo,FIo are 4 contiguous regs. 
//Assume ARe,AIe,ARo,AIo are 4 contiguous regs. 
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adds -16,astart,FEtch 

fld.q 16 (FEtch)++,ARe 

fld.q 16 (FEtch)++,BRe 

adds O,groups,Somecount //bla counter | 
//startup the loop: 


// Al ccuceshBevinn ethSnetegsWeitel 
pfadd.ss ARe,BRe,f0 // AR+BRe 
pfadd.ss AIe,BlIe,f0 // AI+BIe ER 
d.pfadd.ss ARo,BIo,f0 // ARo+BIo EI 
nop . . 
d.pfsub.ss AIo,BRo,ERe // AIo-BRo ERo EI ER 


nop 
d.pfsub.ss ARe,BRe,EIe // AR-BRe EIo ERo EI 

ads ~l1,r0,decrem //2 bflies per loop,but groups is half desired value. 
d.pfsub.ss AIe,BIe,ERo // AI-BIe FR EIo ERo 

adds -16,astart,STore 

d. pfsub. ss ARo,BIo,EIo // ARo-BIo FI FR EIo 

bla decrem,Ssomecount, offset2_loop //init LCC 

d.pfadd.ss AIo,BRo,FRe // AIo+BRo FRo FI FR 

nop 

offset2_ sDOGP Es 


(FEt ch) ++, ARe//fetch AR, AL, ARo, Alo 


(FEtch) ++, BRe ne 3. Ye 
Vd Bless cechQiewee AS ces Writes 
d.pfadd.ss ARe,BRe,FIle // AR+BRe FIo FRo FI 
nop 
d.pfadd.ss AIe,BIe,FRo // AI+BIe ER - FIo FRo 
nop | es 
d.pfadd.ss ARo,BIo,FIo // ARo+BIo EI ~ ERD FIo 
fst.q ERe,16(STore)++ //update ER ,EI ,ERo,EIo 
d.pfsub.ss AIo,BRo,ERe // AIo=BRo ERo EI ER 
nop 
d.pfsub.ss ARe,BRe,EIe // AR=BRe EIo ERo EI 
®nop | 
d.pfsub.ss AIe,BIe,ERo // AI=BIe FR . 
fst.q FRe,16(STore) ++ . : 
d.pfsub.ss ARo,BIo,EIo // ARo=BIo FI 
bla decrem,Somecount,offset2_loop 
d.pfadd.ss AIo,BRo,FRe // AIo+BRo FRo 
nop 
endit:: 
// restore regs. 
fiadd.ss f0,f0,f0 //exit DIM 
fld.g O(sp),fl2 
fiadd.ss f0,f0,f0 //last DIM pair 
fld.q 16(sp) ,f8 
adds $2,Sp,Sp 
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C File: dirr.f 

C FFT - Decimation in Freq, radix-2, inplace, l-dimen, 

C REAL input 

C Intel is not reSponsible for use nor misuse of this code. 


C 8/14/89 

C Inputs: 

C A= REAL array of input, up to 1024 pts, single-prec float 

C M= log of number of pts 

C = (Number of stages of FFT) 

C N = number of points. ie, N= 2**M = number of pts 

C  W= complex array of twiddle factors, length N/2. 

C REV= 0 if bitreversed output ok. l=must re-order output 

C (REV will be ignored, and output will be properly ordered. Bit 
C reversal WILL be done.) 

C 

C Outputs: 

C A= complex fft of input A, but only the positive frequency half. 
C Length = N/2+l complex numbers. A(0:n/2) 

C 


Subroutine dirr(a,m,N,W,REV) 

integer m,N, i, j,k, REV,wlimit _ 

integer offset, stage, groups, wincr,powers2(0:10) 
real a(N) 

complex w(N/2) ,temp 


data powers2 /1,2,4,8,16,32,64,128,256,512,1024/ 
C Powers2 to avoid calls to POW, DIV 


C Twiddle factor array w(k) has (cos,-sin) of 2pi*k/N 
CC Assume the caller provides w(k) constants ALREADY initialized 


C Pre-touch data, for 8kByte fft: (2048 points real) 
IF (N .gt. 1025) THEN 
call fetch(a,%ZVAL(n/2) ) 


wlimit = 8*((N/2) = 1) 


C "DO 20" stage-loop: doing Complex FFT on length N/2 array. Twiddles are 
C for a length N array, so wincr gets scaled by 2. 
DO 20 stage = 1,m-l 
groupS = powers2(stage-1) 
C groupS=number of times the twiddle factors are used, ie, the number of 
C smaller DFTs the stage is split into. 


C offset gets N/4,N/8,N/16,... 

offset = powers2(m-l-stage) 

wincr = groups * 2 

call difstep(a,w,groups,offset,wincr,wlimit) 
20 CONTINUE 


call bitrev(a,%VAL(M-1) ,n/2) 
call realfix(a,w,ZVAL(n) ) 


RETURN 
END 
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// realfix.ss: This is i860(tm) CPU assembly code to revise data from an 
I N/2 length Complex FFT. 
(assumes the input data fed to Complex FFT was N real values) 


INTEL iS not reSponsible for use nor misuse of this code. 


8/14/89 
This 18-cycle-butterfly loop may be sibueetimal: 


output = overwrite the data array used for input. Results are 
complex. ReO0,Im0,Rel,Iml,..., Re(N/2),Im(N/2). 
NOTE that output array is l element longer than input. 


Input is H(k), output is F(k)... Z | 
_ F(k)=.5*( H(k)+ Heonj(N/2-k) -j*(H(k) -Hconj (N/2-k) ) *Weonj (k) ) 


Algorithm from "Numerical Recipes in C", by Flannery, Press, Teukolsky, and 


Vetterling, Cambridge Univ. Press 1988, p.417. 


//* The C-version of ROaL tas * void realfix_ (a,w,n) 

///*Input = 

// a(Osnt+l): length n/2+l complex array. Entries 0: n/2-1 are the complex FFT 
// * result, in correct (NON BIT REVERSED) order. Entry n/2 is undefined. 
// * ws length n/2 complex array of twiddles. (coS,=-Sin(2pi*k/n) ) 

// * ns call-by-value, number of REAL input samples 


// *Output = 

// * a(Osntl): length n/2+l1 complex array. 

// * Format is Re0O,Im0,Rel,Iml,..., Re(N/2) ,Im(N/2). 

// * NOTE: To generate entire N-length complex output spectrum, you can copy 
// * conjugate of element (i) to element (N-i). 

M1 */ | | 

//float all, wll; int n;  { int aptr,bptr, wptr; float half=0.5, 
// -AR,AI,BR,BI, /* input values for A,B*/ 

// PR,PI,SR,SI,DR,DI, /*temporary differences, sums , product s*/ 
// K,L,M,N, /*temporary products */ 

AT _ ER,EI,ERD,EID, 

// FR,FI,FRD,FID, 

// WR, WI ; 


///*We do first and last elements as Special case(Imag=0, W=(1,0))*/ 
// AR = alo]; AI = all]; | 


// alo] = AR + AI; all] = 0; 
// aln] = AR - AI; alntl] = 0; 
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//for(aptr=2, bptr=(n-2), wptr=2; aptr < n/2; aptr +=2, bptr -=2, wptr +=2) 


//{WR = wlwptr] ; WI = wlwptr+l] ; 
// AR = alaptrl; AI = alaptr+1]; 
// BR = albptrl; BI = albptr+1l]; 


// /* aptr =2,4,6...,14; bptr=30,28,26,...,18 (if n=32) */ 
// /* Note that there is no need to revise the value at the middle of the 
// list, as it is already correct. (.5*(H(n/4)+Hconj(n/4)) */ 


// SI = (AL + BI); 

// DR = (BR = AR); 

// K = WR*SI; L= WI*DR; PR = K-L; 

// M = WR*DR; N= WI*SI; PI = M+N; 

// SR = (AR + BR); 

// DI = (AI - BI); 

// ERD = SR+PR; ER = half*ERD; 

// alaptr] = ER; 

// EID = DI+PI; EI = half*EID; 

// alaptr+1J= EI; 

// FRD = SR=PR; FR = half*FRD; 

// albptr] = FR; 

// FID = PI-DI; FI = half*FID; 

// albptr+lJ= FI; } /*end of for-loop */ } 
/ Pere End of Cecode for OAL LL ¢ HERI IAC A SE a a a a ae oe ae ae a 
etext 

ealign .quad 

| [omen meneame 


define(astart, rl6) //input data base address 


define(wptr,rl7) // pointer to W table. Because w-contents aeuend on N, 

// we will assume the caller has initialized w() array. 

define(N,rl8) // “ 
define(aptr, r20) //pointer to lst component of butterfly (load) 
define(bptr, r2l) //pointer to 2nd component of bfly (load) ; DOWNCOUNTER 


define (decrem,r24) //bla decrement 
define(count,r25) // bla counter 


define(WR, f18) //W (twiddle factor), real part 
define(WI, f19) //" " , imag 


define(AR, f12) //element A, real component 

define(AI, £13) // " ", imag 

define(ARo,fl14) // extra A value, for prefetch (o="odd") 
define (AIo, f15) 

define(BR, £16) //element B, real component 

define(BI, f17) 


define(ER, £20) //Result of butterfly which overwrites AR 
define(EI, f21) //" """ AT 


aa //constant 0.5 

define(FR, f24) //Result of butterfly which overwrites BR 
define(FI, f25) 

define (PR, f26) 

define (PI,f27) 


define(DR, f28) 
define(DI, f29) 
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 define(SR, £30) //Sum of A+B, real part 
define(SI, f31) // " ", imag " 


data 

align .double 
halfloc:: .float 0.5 

| |------- - 

etext 

align .quad 
-realfix_:: 

fst.q £12,-16(Sp)++ //save "local" regs 

adds -4,r0,decrem //bla decrement 

| [------- - 

// We do not bother to initialize FP pipes to zero here, as we assume 
// this routine is called after another,"safe", pipelined FP routine. 


pfld.l halfloc,f0 fi 
pfld.d 8( wptr)++,f0 //skip W(0) intentionally. Is a trivial (1,0) value 
// init pointers; 


adds O,astart,aptr 
pfld.d 8( wptr)++,f0 

shl 2,N,bptr //bptr=total # bytes of input data. 
pfld.d 8( wptr)++,half //0.5 into an fpr 

adds bptr,astart,bptr // bptr points to a(N) 


// here fetch first set of A,B,W before bla-loop 
pfld.d 8( wptr)++,WR 

fld.d 0 (aptr),AR //for lst and last elements © 

adds -8,N,count // bla counter (predecrement by 2 butterflies once 

|] ----------- | 
// Do n/4 butterflies: ieeupatics only N/2 elements of eonniek output, because 
// the second Bee: are Just: Goan eonsuEstes of the ae N/2) re 


// Guetiaeacudees episdcaiaersa: 


// WR = cos(), WI==-sin(). 
// DR = BR =— AR; (diffence of Real components of A,B). 
// DI = AI = BI; (diffence of Imag components) Taek 


// SR,SI = sum of A,B 

// PR = K = L; where K= WR*SI, L=WI*DR 
// PI = M + N; where M= WR*DR, N=WI*SI 
// (ER,EI)=complex result to overwrite = 
J/ (PR, Fiys* = 3 CB, 


first_fly:: //fill pipe. 

// For Oth butterfly: 

// AR = alo]; AI = afl]; 
// alo] = AR + AI; afl] = 0; 
// aln] = AR - AI; al[n+1] = 


// KR.eKI..-Ml... M2... eMS i Al....A2....A3....Write 


r2pt.ss f0,f0,f0 // 0 0 . 7 
mrmlp2.ss AR,AI,fO // 0 0 - ~ ERO ae - _— 


mrmls2.ss AR,AI,fO // 0 0 0 FR ER = - 
 fld.d 8 (aptr)++,AR | A Some go | 

fld.d -8(bptr)++,BR 
d.pfadd.ss f0,f0,f0 // 0 0 0 0 FR ER - 
d.pfadd.ss f0,f0,ER // 0 6) 0 0 0) FR ERO 
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d.ralp2.ss AI ,BI ,FR // 

nop 

d.mrmls2.ss BR ,AR ,EI // 

fst.d ER,-8(aptr) 

d.mr2pt.ss WR ,f0O, FI // WR 

fst.d FR, 8(bptr) , 
d.ralp2.ss BR ,AR ,SI // Kl - 

andh 0x8000,count,rO //check for negative 
d.ml2tpm.ss WI ,DR ,DR // LL Kl 

bne endfix 
d.répt.ss half,DR, f0 //half M1 Ll 

nop 
d.ml2ttpa.ss WI ,SI ,SR// Nl M1 

nop 
d.i2gst.ss fO ,fO ,f0// fo - Nl M1 

nop 

// KR..KI..Ml....M2....MS Al...-A2....A5....Write 
d.ratls2.ss AI ,BI ,f0O // - - Nl DIl PRL 
nop 
d.i2pt.ss fO.,f0, f0// | PIl DIl 
fld.d 8 (aptr)++,AR 
d.r2apl.ss SR ,f0, PR// ERD PIL 
fld.d -8(bptr)++,BR : 
d.rals2.ss SR ,PR, DI // -FRD ERD 

pfld.d 8( wptr)++,WR , 
d.r2apl.ss DI ,f0, PI// 

nop : 
d.rals2.ss PI ,DI ,f0o // 

nop : : 
d.ralp2.ss f0O ,f0 ,f0 // 

nop . 
d.rals2.ss f0 ,f0 ,f0 // 

bla decrem,count,fix_loop 
d.pfadd.ss f0O ,f0O ,FI // EIl 


// Each butterfly = 1 complx multiply, 3 complx add, 1 real multiply — 
//= 8 multiply, 10 add/subtract 

// 3 8=byte fetches (A, B, W) 

// - 2 8=byte stores (E, F) 

// 

// approx. 18 cycles per butterfly 

// 
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fix_loop:: // KR. «KI. eMl.. oe -M2.0 0 eMS Al vcesAQes ccASese Write 
d.mr2pt.ss f0 ,FI ,ER // 0 FIl EIl FR1 - - - ER1 
nop . 
d.mrmlp2.ss AI ,BI ,FR // 

nop 

d.mrmls2.ss BR ,AR »EI // 

fst.d ER ,~8(aptr) 

d.mr2pt.ss WR ,f0O, FI // WR 

fst.d FR, 8(bptr) i 

d.ralp2.ss BR ,AR ,SI // Ke - 

andh 0x8000,count,rO //check for negative 

d.ml2tpm.ss WI ,DR ,DR // | L2 K2 

bne endfix 

d.r2pt.ss half,DR, f0O //half M2 L2 

nop. 
d.ml2ttpa.ss WI ,SI ,SR// 

nop : | 
d.i2st.ss fO ,f0 ,f0// 

nop 

// KR. .KI. Ml... M2... -M3 Al....A2....A35....Write 

d.ratls2.ss AI ,BI , f0// - - N2 DI2 PR2 - 
nop : \ | | | 
d.i2pt.ss fO ,f0, f0// PI2 DI2 PR2 
fld.d 8 (aptr)++,AR 
d.rgapl.ss SR ,f0, PR// | ERD PI2 DI2- 
fld.d -8(bptr)++,BR | 
d.rals2.ss SR ,PR, DI// | FRD ERD PI2 
pfld.d 8( wptr)++,WR 4 
d.r2apl.ss DI ,f0, PI// _ ss FRD ERD 
nop | | 7 
d.rals2.ss PI ,DI ,f0 // 

nop 
d.ralpe.ss f0 ,fO ,f0 // 

nop . 
d.rals2.ss f0 ,fO ,f0O // 

bla decrem,count,fix_loop 
d.pfadd.ss f0 ,f0O ,FI // 


endfix:: 
// restore regs 
fiadd.ss f0,f0,f0 //exit DIM 
fld.q O(sp),f12 
fiadd.ss f0,f0,f0 //last DIM sale 
adds 16,Sp,Sp 
bri ri 
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PROGRAM FFITEST 
file = real.f 


1l-D FFT TEST PROGRAM 
8/14/89 
Intel assumes no responsibility for use or misuse of this code. 


PARAMETER (IREV=1) 

character*8 really 

PARAMETER (REALLY='real') 
PARAMETER (REALLY='complex') 
PARAMETER (TIMEIT=0, CACHETIME=0) 

REALLY='real' means real-only input, otherwise assume complex input 

DATA IT/200000/ 

PARAMETER (N=2048,M=11) 
PARAMETER (N=1024,M=10) 

PARAMETER (N=512,M= 9) 

PARAMETER (N=256,M= 8) 

PARAMETER (N=128,M= 

PARAMETER (N=64,M= 

PARAMETER (N=32,M= 

PARAMETER (N=16, M=4) 

PARAMETER (PI=3.1415926536) 
COMPLEX X2(N) ,X(N) ,X3(N), W(N/2) 


Real ASQR(N) , ASQR2(N) ,XR(N+2) ,XR1(N+2) ,XR2(N+2) ,XR3(N +2) 
complex wtemp . 
real rtemp 


PRINT *,' FFI test program ....' 


IF (IREV -eq. 0) THEN 

print *,'NOT counting time for bit-reversal.' 

print *,'DO NOT expect matching answers,without pideneye 
ELSE 

print *, 'Time for bit-reversal included.' 
ENDIF 


print *, 'Time for cache writeback and fills...’ 
IF (CACHETIME .eq. 0) THEN 

print *,' NOT included, if iterating.' 

ELSE 

print *,' ... included.' 

ENDIF 


print 

print ‘If iterating... Number of Iterations =',IT 
print 

print "Number of Points 

print '(',REALLY,' data) ' 

print 
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C Init twiddle factor array w(k) with (coS,-Sin) of 2pi*k/N 

rtemp = 2.0*pi/N 

wtemp= CMPLX(cos(rtemp), -Sin(rtemp) ) 

w(l) = (1.0, 0.0) 

DO 200 k = 2,N/2 
200 w(k) = wtemp * w(k-1) 
ce print *,' W (twiddle) initialization completed......' 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
C INITIALIZE input data 
C 

DO 100 IT=1, N 

csconstant: 
Cc Treal 1.0 
Cc Timag 0.0 


c sSquarewave : 
cc IF (I .1t. N/2) THEN 
ce Treal = 1.0 
Timag = 0.5 


ELSE 
cc Treal- 
cc Timag 
cc ENDIF 

: ramp function: 

Treal = I = 1.0 

Timag = Treal + 0.5 
IF (REALLY .ne. 'real') THEN 

X(I) = CMPLX (Treal, Timag) 
X2(I) = CMPLX (Treal, Timag) 
X3(1I) = CMPLX (Treal, Timag) 


0.0 
0.0 


ELSE 
X(I) = CMPLX (Treal,0.0) 
X2(I) = CMPLX (Treal,0.0) 
XR(I) = Treal 
XR1(I) = Treal 
XR2(1) = Treal 
XR3(I) = Treal 

ENDIF 


100 CONTINUE 
C ; 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
CALL fft (X2, M, N) 
ce Subroutine fft is Decimation-In-Time, Fortran version. 
CALL dirr(XR,M,N,W,1) | 
c (Assuming dirr produces inplace result, items 0:N/2 complex results) 
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eccccecccccecccccececccecceccceccececccceccccece 
IF (IREV .ne. 0) THEN 
IF (TIMEIT .eq. 0) THEN 
call vcompare (XR,X2,N/2+2) 
call cmags(XR,N/2+1,ASQR) 
c cmags to take Squared magnitude of complex values in X 
cmags (X2,N,ASQR2) 


non-zero results: 


DO 700 I = 1,N/2+1 
IF ((ASQR(I) .GT. 1.0) .OR. (ASQR2(1I) .GT. 1.0)) THEN 
WRITE (6,22) (I-11), ASQR(I), ASQR2(T) 
22 FORMAT (' I-1=',I4,' ASQR(I)= ',F14.2, ' ASQR2(I)= ',F14.2//) 
J = J+l 
IF (J .GT. 32) GOTO 725 
ENDIF 
700 CONTINUE 


725 CALL TIME 
ENDIF 
ENDIF 


IF (TIMEIT .ne. 0) THEN 
eccccecceccccecccccecceccccecccceccccccce 
cc- Timing loop follows: 


. print *,' Start Ass.FFT' 
IF (CACHETIME .eq. 0) THEN 
DO 500 I = 1, IT,4 : we 
C Reuse same array, so cache fill and writeback time NOT included. 
CALL dirr(XR, M, N,W,IREV) 
CALL dirr(XR, M, N,W,IREV) 
CALL dirr(XR, M, N,W,IREV) 
500 CALL dirr(XR, M, N,W,IREV) 
ELSE 
DO 504 I =1, IT,4 . 
C Alternating between XR,XR1,XR2,XR3 should provide cache misses. 
CALL dirr(XR, M, N,W,IREV) 
CALL dirr(XRl, M, N,W,IREV) 
CALL dirr(XR2, M, N,W,IREV) 
504 CALL dirr(XR3, M, N,W,IREV) 
ENDIF 
print *,' END Ass. FFT! 
ccccccccececcecccececceccececcececccccccce 
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subroutine vcompare(res,exp,n) 
c VCOMPARE compares 2 vectors, prints out ‘st few miscompares. 
Cc 

integer n, errcnt 

real res(n), exp(n) 


write(6,12) 
format ('*** VCOMPAREs: vector comparison beginning ***') 


data errent/0/ . 
do 30 i =1,n 
if(AINT(res(i)) .ne. AINT(exp(i))) then 
ec {print out error, exit if alot already} 
120 print *,'*** Error in compares ***'* 
write(6,121) i 
121 format(' Item number = ',I6) 
. write(6,124) res(i), exp(i) 
124 ; format('  Res_=',F14.2,' Expected_ =! »F14.2) 
- errent = errent + 1 
if (errent .gt. 19) then 
return 
end if 
end if 
30 continue 


if (errent .eq. 0) then 
190 print *,' *** vector compares SUCCESSFUL ***! 
end if 


99 return 
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AAA AAAARAANAAARAaAAANRAAAAG 


C 


C 


C 


file: fft.f 


FFT routine from Rabiner & Gold, 1975, who copied it 
from Cooley, Lewis, Welch 
6/02/89 


Decimation in Time, radix-2, inplace, l-dimen 
Inputs: 
A= complex array of input, up to 1024 pts, single-prec float 
(maybe more than 1024, uncertain what limit is) 
M= log of number of pts 
= (Number of stages of FFT) 
N = number of points. ie, N= 2**M = number of pts 


Outputs: 
A= complex fft of input A, in NON-bit-reversed order. 


w (twiddle factor) calculated by recursion. Supposedly takes 15% more 
operations than keeping entire twiddle array as constants pre-allocated. 
Subroutine fft(a,m,n) 
integer m,n, i, j,k, ndiv2,powers2(0:10) 
integer iplus,offset, stage, indexl, groups 
complex a(n) ,wtemp(2) ,w(1l) , temp 


Init twiddle factor array w() with (cos,-sin) of pi,pi/2,pi/4,... 
data w(l1) /(-1.0,0.0) / 
data w(2) /(0.0,-1.0) / 
data w(3) /(0.7071068,-0.7071068) / 
data w(4) /(0.9238795,-0.5826854) / 
data w(5) /(0.98078535,-0.19509053) / 
data w(6) /(0.9951847,-0.0980171) / 
data w(7) /(0.9987955,-0.0490677) / 
data w(8) /(0.9996988,-0.0245412) / 
data w(9) /(0.9999247,-0.0122715) / 
data w(10) /(0.9999812,-0.0061359) / 
data w(1l) /(0.9999953,-0.003068) / 


data powers2 /1,2,4,8,16,52,64,128,256,512,1024/ 
Powers2 to avoid calls to POW, DIV 


Setup for bit-reversal loop: 
ndiv2 =n/2 


"DO 7" loop to in=-place-bit-reverse-shuffle input 
DO 7 i=l, n-l 
IF (i .1t. j) THEN 
temp = a(j) 
a(j) = a(t) 
a(i) temp 
ENDIF 
k = ndiv2 
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C "While (j .gt. k)" /*decrease j by 2**something */ 
6 IF (j .gt. k) THEN 
j = j-k 
k=k/2 
GOTO 6 
ENDIF 
C Add next lower power of 2 to j 
7 #%j = j+k 


C Special case for stage 1: no complex multiplies, eae add 
C (Performance enhancement ) 


groups = 2 

offset = l 

indexl = l 
C i-loop iterates N/2 times for lst stage lane would do twice N/4 x for ena) 
CVD$ NODEPCHK 


DO 8 i=1,n,2 
sparen = i. + 1 


temp = apiday. 
a(iplus) = a(i) =- temp 
8 a(i) = a(i) + temp 


C Special case for stage 2: no complex mitiplies, simple add 
C (Performance enhancement) 


groups = 4 
offset = 2 
indexl = 1 


C i-loop iterates N/4 times for 2nd stage | . 
C lst call to i-loop,in stage2: indexl=1, hc ia 0) 


CVD$ NODEPCHK 
DO 90 i = 1,n,4 
iplus = i+ 2 


temp = a(iplus) 
a(iplus) = a(i) - ene: 
90 a(i) = a(i) + temp 


indexl = 2 


CVD$ NODEPCHK 
CVD$ NOVECTOR 


DO 92 i = 2,n,4 
iplus = i + 2 . 
temp = CMPLX(AIMAG(a(iplus) ) ,=~REAL(a(iplus) ) ) 
a(iplus) = a(i) = temp 
92 a(i) = a(i) + temp 
CVD$ VECTOR 


C "DO 20" stage-loop executed once for each of the (m) nbaees of FFT 
C (Except lst and 2nd stage) 
C offset gets 4,8,16,32,64, 128, 256 .e5 
DO 20 stage = 3,m 
groupsS = powers2(stage) 
offset = groups/2 
wtemp(1) =(1.0, 0.0) 
One twiddle seed (W) calc per stage. 
C We pre-allocated w(1l2)-array with those values, avoid cos/sin calls 


Qa 
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DO 20 indexl = l,offset 


C "DO 10" i-loop does each butterfly of each stage, with varying twiddles 
C i-loop iterates N/2 times for lst stage, N/4 x for 2nd, N/8 x for 3rd 
C Stage, N/16 x for 4th stage,... 1 time for last stage. 


CVD$ NODEPCHK 
CVD$ ALTCODE 
DO 10 i = indexl,n,groups 

iplus = i + offset 
temp = a(iplus) * wtemp(1l) 
a(iplus) = a(i) - temp 

10 a(i) = a(i) + temp 

20 wtemp(l) = wtemp(l) * w(stage) 


Subroutine cmags(a,n,asqr) 
Complex magnitude squared. 
Inputs: 
A= complex array of input, single=-prec float 
N = number of input points (and output points) 
Ouput : 
asqr = real Squared magnitude (R*R + I*I), N elements, single-prec float 


integer n,i 
real asqr(n) 
complex a(n) 


D0 100 i=1, n 
asqr(i) = (REAL(a(i))*REAL(a(i))) + (AIMAG(a(i)) *AIMAG(a(4))) 
100 CONTINUE 
RETURN 
END 
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## makefile for i860(tm) CPU FFTs (for Unix V/386 programming environment) 
## 8/7/89 | . 

## 

GH=/usr/i860/bin 

GHL=/usr/i860/lib 

CC=$ (GH) /c860 

FC=$ (GH) /f860 


CFLAGS= -OLM -X595 -X405 -X188 -X370 


FFLAGS= -OLM =-X370 -X3935 -X71 =-X422 
## -X71 uses Single-precision math routines 


FLFLAGS= -Mx map -e start 
LFLAGS= -Mx map <e main 
CLIB=$ (GHL) /libe.a 

MLIBPSR=$ (GHL) /86Omtlib.a 


MLIB=$(GHL) /libm.a 
FLIB=$(GHL) /libf.a 


ASM=$ (GH) /as860 

FLINK=$ (GH) /14860 § (FLFLAGS) 

RT=$ (GHL) /s5lib.a 

LIBS= $(FLIB) $(MLIBPSR) $(MLIB) $(CLIB) $(RT) 


LIBCC= $(MLIB) $(CLIB) $(RT) | 
## NOTE: Order of linked files is CRUCIAL, other orders may give errors 


~SUFFIXES : 
eSUFFIXES: .f .c .S .SS .0 8 


~ IGNORE: | 
## .ignore causes make to ignore error codes from compilers 


## To test Fortran plus assembler-fft-stage version: | 
FILE= ffttest.o fft.o diff.o bitrev.o difstep.o sStart.o time.o 


## To test all-Fortran version of fft: | 
#H#FILE= ffttest.o fft.o diff.o difstepf.o start.o time.o 


## To test REAL-input version of fft: 
RFILE= real.o fft.o dirr.o realfix.o difstep.o bitrev.o Start.o time.o 


f.03 
$(FC) $(FFLAGS) $*.f 
$(ASM) =x -o $*.0 $*.S 


eC.08 


$(CC) $(CFLAGS) $*.c 
$(ASM) =x -o $*.0 $*.5 
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6S.05 
m4 $*.S temp2.s 
$(ASM) -x -o $*.0 temp2.s 
ffttest.8: $ (FILE) 
$(FLINK) -o ffttest.8 $(FILE) $ (LIBS) 
real.8: $ (RFILE) 
§(FLINK) -o real.8 $(RFILE) $(LIBS) 


clean: 
rm -f *.0 *.8 


055.08 


m4 $*.ss temp.s 
$(ASM) =x -o $*.0 temp.s 
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//start.ss 
// 8/18/89 
// Fortran runtime startoff routine 
// | 
etext 
egZlobl start 
eglobl finish 
Starts: 
orh h%_stack+262128+262144,r0,sp 
or 1%_stack+262128+262144,sp,Sp _ 
adds ~16,Sp,Sp 
St.l rl,12(sp) 
call main 
nop 
finish:: : 
call ~exit 
nop 
efile "start.c" 


-quad | 
~Stack ,262144+262144 


/* file: time.c. Purpose: eStablish a label to use for breakpoints */ 
long time. (x) 
long . sab 
{ x= «+43 
return( (long) x) 3 
} . 


long  timestop. (x) 
long *X 3 
{ x = x44: 

return( (long) x) ; 


2-446 


PRELIMINARY 


APPLICATION AP-452 


NOTE 


December 1991 


MARK ATKINS 
ISIC SILAS 
CHRIS KARLE 


Order Number: 240957-001 
2-447 


Designing a Memory Bus Controller 
for the 82495/82490 Cache 


CONTENTS PAGE 
1.0 BACKGROUND ..................... 2-450 
2.0 WHY A CUSTOM BUS 
INTERFACE? ........................ 2-451 
3.0 GUIDELINES .................. 000 2-451 
Shared Bus Interconnect ............ 2-452 
4.0 MBC BLOCK DIAGRAM ........... 2-452 
5.0 DESIGN EXAMPLE: A 
_ UNIPROCESSOR MBC .............. 2-454 
6.0 DESIGN EXAMPLE: A 
MULTIPROCESSOR MBC ........... 2-454 
7.0 MBC FUNCTIONS ................. 2-456 
MBC Functions for Uni and | 
Multiprocessors ...............006- 2-456 
Reset and Configuration Control .. 2-456 
Intel486 DX CPU Resets ......... 2-457 
FLUSH # (and SYNC #) viget dace s 2-457 | 
Bus Error or Timeout Detection ... 2-458 
Scenarios Requiring MBC er 
ACION: cites esesornenaennes 2-458 
Transfer Tracking ................006 2-458 
Clock Boundariesand its | 
Synchronization ........... beeenatnas 2-459 
Synchronizer Delays ................ 2-461 
BRDY # Generation ................. 2-462 
Pipelining .............. sieieweraes 1+. 2-464 
Pipelining the MBC-to-82495 ..... 2-464 
Pipelining the M-bus .............. 2-464 
M-bus Arbitration ........ Leceutesene« 29684 
SEQUENCING .......... 6c cece cence eens 2-465 
Flowchart of MBC Algorithm ...... 2-467 
Cacheability ..................04. . 2-468 
SNOODING | aetiers our eececseeaeeas 2-468 


Snoop Handshaking .............. 2-468 ° 


CONTENTS PAGE 
Bus Size Adaptation ........... ... 2-468 
Bus Signal Levels ................... 2-468 
8.0 MBC FUNCTIONS FOR 
MULTIPROCESSORSG ................ 2-469 
Snooping Results ...... sete eens 2-469 
Snoop Window Time ............. 2-470 
Read for Ownership ..... Loner rey 2-470 
Cache-to-Cache Transfers ....... 2-471. 
Snoop Filtering ................... 2-471 
Split Transaction ................. 2-472 
Memory Cycle Abort .............. 2-472 
MOCKING -sticctita wad cans aeee 2isevas 2-472 
Bus Lock vs. Address Lock .......... 2-473 
KLOCK # De-Assertion ........... 2-473 
CPEOCKF a.iseiscctdtieteisesexes 2-473 
9.0 MORE ALTERNATIVES .......... 2-474 
M-bus Clocking .................. 2-474 
Strobed or Clocked M- bus bie ete! 2-474 
| _Line Size and M-bus Width ....... 2-474 
Writeback ..............ce scene nee 2-474 
10.0 MBC DIFFERENCES FOR 
i860™ XP CPU VERSUS | 
Intel486T™ DX CPU ............. wee 26475 
11.0 SUMMARY ....................08. 2-475 
12.0 BIBLIOGRAPHY ....... pyaar eee 2-476 
APPENDIX A: Questions and Answers 
on MBC Design ..................... 2-477 
APPENDIX B: Intel486 DX CPU 
_Uniprocessor MBC Design ......... 2-480 
“APPENDIX C: i860™ XP CPU Dual- 
Processor MBC ..................... 2-481 


2-448 


Designing a Memory Bus Coniroiier 
for the 82495/82490 Cache 


CONTENTS PAGE 
FIGURES 
Figure 1 CPU+82495+ 82490 

SYSIOMS  vi-n05 oes eceaweeiees 2-450 
Figure2 CPU+82495+ 82490 Core .. 2-451 
Figure 3 System Type and Bus 

Requirements ............... 2-452 
Figure 4 Generic Block Diagram of 

MBO aijotiw iced Sceumecwun’ 2-453 
Figure 5 Block Diagram of Uniprocessor 

CO cactus eae nates ise BOs 2-454 

Figure 6 Block Diagram of 

Multiprocessor MBC ......... 2-455 
Figure 7 Synchronizer Hardware ...... 2-461 
Figure8 Data Transfers, M-bus Width = 

CPUbus, MCK = CLK ....... 2-463 
Figure9 Data Transfers, M-bus Width = 

CPUbus, MCLK < CLK ...... 2-463 
Figure 10 Data Transfers, M-bus Width = 

2 GPUDUS’ ¥s2400s eee eee 2-463 
Figure 11 Data transfers, M-bus Width = 

AT GCPUDUS! sesecaceteawdsaueks 2-463 
Figure 12 Data Transfers for Non- 

Pipelined M-bus ............. 2-464 
Figure 13 Data Transfers for Pipelined — 

MrDUS? 22 axeceirenekcvetaasaks 2-464 
Figure 14 MBC Signals and Protocol 

Layers ..... Ser ee errr eee 2-466 


CONTENTS | 


FIGURES 
Figure 15 
Figure 16 


Figure C-1 Pinout Environment of 


MBC 
Figure C-2 


Figure C-3 Aborted Non-Pipelined | 


CVCIOS iacicqeedusy he eansews 
Aborted Pipelined Cycles ... 


Figure C-4 


Figure C-5 Potentially Allocatable 


Write 
Non-Allocatable Write 


Interprocessor 
Communications in Two 
Processor System 


Extension Glue 


Figure C-6 
Figure C-7 


Figure C-8 


State Diagrams ............... 6. eae 
PED: COGES: sii ccnsmiedatanneneennass 


Appendix C Schematic 


TABLES 


Table 1 Functions of the Memory Bus 
Controller 


Table 2 Clocked vs. Strobed M-bus 


Tradeoffs ................000- 


2-449 


Creating Snoop Results ... 
Snoop Waveforms ........ 


ese veeeevreeveeee ee Fee eee 


Non-Aborted Read Cycles .. 


oeoewe ev eo eeveeveee seer e eee eee ve 


oeoeeoer eee eee wee eee em wo 


intel. 


1.0 BACKGROUND 


The Intel 82495 Cache Controller and 82490 Cache 
RAM form a high-speed cache subsystem for the 
Intel486 DX CPU (82495DX/490DX) or the i860 XP 
CPU (82495XP/490XP). The reader should be familiar 
with these chips, as described in: 


1) i860 XP CPU Microprocessor Data Sheet (intel ¢ or- 
der #240874) 


2) Intel486 DX pat ealeeeno pala Sheet (Intel order 
#240440) 7 


3) 82495XP Cache Controller/82490XP Cache RAM 
Data Sheet Cntel order #240956, June 1991) 


or Intel486 DX CPU Microprocessor Cache- Chip 
_ Set Data Sheet (Intel order # 241084, June 1991) 


Diagrams of systems containing the 82495 and 82490 
appear in Figure 1, and a more detailed diagram of the 
CPU/82495/82490 core appears in Figure 2. (Note: for 
simplicity, the 82495XP/82490XP and 82495DX/ 
82490DX will. be referred to generally as 
82495/82490—the XP or DX’should be inferred de- 
pending upon the CPU being utilized.) In such systems, 
the 82495 controls a cache external to the CPU, and 
includes the cache tags. It can interface gluelessly to an 
Intel486 DX CPU or 1860 XP CPU microprocessor, 
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allowing the processor bus to run at 50 MHz with zero 
wait-states, while the memory bus can remain at a low- 
er frequency. Both writeback and writethrough proto- 
cols are supported. Concurrent operations can occur 
simultaneously on the local CPUbus and the shared 
memory bus. All requisites for multiprocessors are in- 
cluded in the 82495, Intel486 DX CPU, and i860 XP 
CPUs, but the 82495 also is useful for a uniprocessor 
system performance enhancement. 


The 82490 cache RAM contains 32 kBytes per chip, 
and is used in groups of 4, 8, or 16 to implement caches 
from 128 to 512 kBytes. It supports two-way associativ- 
ity, delayed writebacks, burst transfers, and boundary 
scan test. The 82490 contains much more than RAM 
cells—it includes various buffers, queues, and support 
for several bus protocols. It is two-ported, with simulta- 
neous access on both the CPU side and Memory-Bus 
side. The cache optionally supports parity using addi- 
tional 82490 chips. 


Configuration options allow a variety of memory bus 
widths (32 to 144 bits), cache line widths (16 to 128. 
bytes), and asynchronous or synchronous transfers. 
The configuration is selected by the polarity of various 
pins at reset time. 


-1. Uniprocessor 


32,64, or 128 
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3. Heterogeneous Multiprocessor 
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Figure 1. CPU + 82495 + 82490 Systems 
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Bus 
Controllar Controller 


The Memory Bus Controller (MBC) portion of the sys- 
tem interfaces the 82495 and CPU to the system bus. 
The MBC converts bus status and command lines into 
requests to the 82495, for example, to monitor the prog- 
ress of an ongoing bus transaction from another CPU 
subsystem to ensure consistency with 82495+ 82490 
cache contents. Likewise the MBC adapts 82495 re- 
quests to the bus protocol and arbitrates for ownership 
of the bus. Most CPU requests will not require MBC 
action; only I/O cycles, cache bypass requests, and 
82495 cache misses are forwarded by 82495 to the 
MBC, while external cache hits are handled totally by 
82495 + 82490. 


2.0 WHY A CUSTOM BUS 


INTERFACE? 


Clearly the entire interface to a memory bus (abbreviat- 
ed M-bus) could have been incorporated in the 82495 
and 82490 chips. This approach has been followed by 
some other cache chipsets. | 


However, such integration suffers from inflexibility and 
bandwidth limitations. As shown in Figure 3, the per- 
formance and cost targets of the system determine the 
size and complexity of the bus, so if the bus is “‘hard- 
wired” into the cache controller chip, it will be too 
costly for small systems and too slow for larger sys- 
tems. With the bus interface implemented separately, it 
can be a complex ASIC for a high-bandwidth complex 
system, or a few EPLDs for a PC. The same cache 
controller can improve performance of a variety of bus- 
based CPUs. | 


For a desktop PC, a 32-bit simple memory bus is ade- 
quate. For a workstation or small multiprocessor of 
~ two CPUs, a faster 64-bit bus may be required to give 
adequate bandwidth for graphics frame buffers and in- 
tensive numeric calculations. Bus bandwidth require- 
ments grow as the MIPS rating of each CPU in a sys- 
tem grows; for example, a bus adequate for 12 386 
CPUs may be too slow for 6 Intel486 DX CPUs, as 
they process far more data per second. 


2-451 


AP-452 


intel486™ or 
i860 xp 
CPU 


CPU Bus 


Figure 2. CPU + 82495+ 82490 Core 
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32,64, or 128 . 
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A large multiprocessor of 6 or more CPUs needs a wide jum 


and fast bus such as Futurebus+, with split-transac- 
tion capability to prevent bus bottlenecks from slowing 
the performance of every processor. Hierarchies of bus- 
es and caches can further allow more CPUs with rea- 
sonable performance increases as CPUs are added. A 
Futurebus+ hierarchy maintains concurrent transac- 
tions on each bus, and “‘bridge’”’ caches at the junctions 
of buses echo them from bus to bus when the bridge 
detects that one transaction may affect cached copies 
on the other bus. 


Compatibility with existing buses is often crucial in 
product design, so that new faster components can plug 
into existing machines and I/O devices. The flexible 
82495/82490 bus interface allows compatibility as well 
as extension. | 


Thus the 82495 and 82490 will be used in a wide variety 
of systems, including standard buses like Futurebus+. 


For proprietary buses, the “proprietor” can design an 


ASIC or PAL MBC incorporating the required fea- 
tures. 


3.0 GUIDELINES 


This document exists to clarify the necessary compo- 
nents and tradeoffs of a Memory Bus Controller. The 
example designs here have not been tested, and signal 
definitions of the i860 XP CPU, Intel486 DX CPU, 
82495, and 82490 chips are subject to change. 


The memory bus controller is not allowed to use (and 
thus add capacitance to) any of the CPU pins used by 
the 82495/82490, except those listed in the 82495 Data 
Sheet [82495/490DS] description of the BLE# pin. 
Only the CPU pins BE7-0#, PWT, PCD, LEN, 
CACHE#, BRDY#, PCYC, and CTYP have suffi- 
cient timing margin to tolerate the MBC load. 


Small 
2-3 CPUs 


a 


CPU<~>Memory 
Interconnect > 


Simple Pipe 


Bus 


Bus Width, 
Frequency 


32 bit 
20-40 MHz 


32 or 64 


Cache 
: WriteThru 


Arbitration Central 


(HOLD/HLDA) 
I 3 


LOCKing <— Bus Lock 
I 


~——— Parity 


Error Detect 
Bus Protocol 


Extra 
Features 
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Medium 
4-8 CPUs 
Multiprocessor 


Large 
8+ CPUs 


Crossbar or 
Bus Hierarchy 


lined Bus 


64 or 128 64 or 128 bit 


33 MHz or more ————————> 


WriteBack 


I 
82495 + 82490 


<«—- 3rd Level Cache ——> 


Distributed Arbitration 
Bus Parking 


Address Lock -——--— 


ECC on Memory, Retry —> 


~<— Simple —>~<— Pipelined —><—— Split Transaction ——> 


Read-for-Ownership 
Cache-to—Cache Transfers 
External FIFOs 
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Figure 3. System Type and Bus Requirements 


Shared Bus Interconnect — 


When used in a multiprocessor, the 82495 assumes a 
shared-memory, shared-bus environment so that it can 
observe and “snoop” accesses by others which might 
conflict with the memory locations it has cached. In a 
crossbar or other multipath interconnect, shared-bus 
coherency can be emulated for the 82495 or it can be 
used non-coherently. Either a centralized directory or a 
hierarchy of buses and caches can do the emulation. A 
directory would keep a record, for each line of main 
memory, of caches which have the line. When a cache 
first writes to a line of memory, the central directory 
broadcasts an invalidation message to all other caches 
containing that line. [Agarwal88] 


4.0 MBC BLOCK DIAGRAM 


Shown in Figure 4 is a high-level block diagram of the 
functions and interfaces involved in the Memory Bus 
Controller. Part of the MBC operates on the high-speed 
clock (CLK) which the CPU and 82495 use. While the 
M-bus could use the 50 MHz CPU CLK, such a fast 
M-bus is hard to design. The part of the MBC which 


interacts with the memory bus protocol runs on an M- 


bus clock (MCLK), if that protocol is clocked. Also 
possible is an unclocked M-bus protocol using the 
82495/82490 in “strobed” mode. The MBC contains’ 
synchronizers and a few signals which cross between 
the two clock domains. Synchronizers, consisting of 
specially-designed flip-flops, allow a clocked state ma- 
chine to use data which may be transitioning near the 
edge of the clock. Unsynchronized data can cause 
metastability in latches, where their output changes 
slowly and unpredictably. | 
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Figure 4. Generic Block Diagram of MBC 
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5.0 DESIGN EXAMPLE: A 
UNIPROCESSOR MBC 


A simple MBC design example is an adapter to allow 
plugging a daughtercard module with an Intel486 DX 
CPU, 82495, and 4 82490s into an Intel486 DX CPU 
microprocessor PGA socket. The memory bus is an 
Intel486 DX ‘CPU-bus, allowing the external cache to 
be a performance enhancing option. It assumes a “di- 
vided synchronous” M-bus clock, where the M-bus 
runs at '/, the CPU CLK speed. Thus no synchronizers 


are needed. The MBC uses both the CPU CLK and the © 


M-bus MCLK. = 


This design requires 

e 1 74F377 latch 

¢ 6 PLDs containing 10 state machines 

e 2 chips for clock generation, not part of the MBC 


Approximately 70 signal pins connect the MBC block 
to the CPU, cache, and memory. Only a uniprocessor is 
supported, although the bus protocol and MBC could 
be enhanced for multiprocessing coherency. Figure 5 
shows a block diagram. Details of the design can be 
found in Appendix B. ok 


BRDYCI#@ BRDYC2# 


f ADSO ,W/RO,D/CH,M/IOM 
: LEN, LOCK@, PCD, Pwr 


cLocK 
GENERATOR 
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6.0 DESIGN EXAMPLE: A 
"MULTIPROCESSOR MBC 


An 1860 XP CPU multiprocessor-capable MBC (Figure 
6) using an M-bus similar to the i860 XP CPU bus is 
proposed. ‘For clocking, it uses an MCLK of 33 MHz, 
totally asynchronous to the 50 MHz CPU CLK. It 
could therefore be upgraded to faster CPU CLK rates 
in the future without changing the design or M-bus. 


The design requires: 
© 2 74F377 octal latches (for BE7-0#, etc...) 
© 2 74AS4374 dual-rank-synchronizer octal registers 
e 16 PLDs | 

-e 2GAII1 10 clock drivers for clock distribution 


These components could be-integrated into a single 
ASIC chip, as about 120 signals connect to the MBC. 
The MBC can be used for a uniprocessor or multipro- 


-,cessor 1860 XP CPU design. Details can be found in 


Appendix C. | 7 


WRARR® , WAY ,MAWEA® ,SOCYCH# 
BUSS, WHWES ,WBRA,WBTYP , BLASTS 
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ADB@,COTS#, SNPADS#® 
BGT#, KWENDS# , BWEND#®, CRDY® 
W/RY,CD/CO,CM/IO® 
PALLC#,CLENO,CLEN1 
LOCKe,cPLOcK# 
ICACHE® , RDYBRC 


AOCE®, FLUSH®, SNPCLK 
SNPSTBS, ANPINV 
KEN®,MROe 
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MBLAST#, MLOCK# ,MPLOCK#® 
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Figure 5. Block Diagram of Uniprocessor MBC 
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Figure 6. Block Diagram of Multiprocessor MBC 
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7.0 MBC FUNCTIONS - 


Table 1 shows the responsibilities of the Memory Bus 
Controller for uniprocessors and multiprocessors (MP). 
The multiprocessor features exist mainly to prevent bus 
over-utilization. However, some of the jobs common to 


both are more complex in MP for example, arbitration 


and snooping. The pin lists in the as are not exhaus- 
tive. 
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MBC Functions for Uni and | 
Multiprocessors 


Reset and configuration control includes ‘strapping of 
the following pins to resistors at Vcc or Ground, or 

“temporary strapping” of multifunction pins whose 
state during the last 16 clocks before falling edge of 


RESET determines 82495, 82490, or CPU configura- 


Table 1. Functions of the ace Bus Controller 


MBC Functions for Uni and Multiprocessors 
_ 1, RESET and Configuration 
2. FLUSH# and SYNC # 
** 3. Bus Error Detection, Retry 
4, CPU transfer tracking (burst count) 
5, Mbus transfer tracking (burst count) 
(including writeback, allocation) 
6. Synchronization between clock domains 
_ ** 7, Memory-bus pipelining _ 
** 8. MBC-to-82495 pipelining 
~.  9.Memory Bus Arbitration 
10. Cacheability decode 


**441, Redrive bus signals for BTL or ECL ivels or caw cxpneiive loads i 
**12. Packing (convert 32-bit M-bus for 64-bit 82490 size, or 8-bit a 


-**413, Bus messages (interrupts, flushes) 
**14, Boundary scan and selftest 


**15. Performance monitoring (M-bus utilization, read vs. write). 
16. Snoop handshake (snooping DMA or other oo 


17. Snoop writebacks 


Additional MBC Functions for Multiprocessors" 
M1. Snoop window (as master) 


M2. Backoff 82495 when request was to M- line i in another 82495, 


**M3. Snoop filtering (via SMLN #) 
**M4. Cache-to-cache transfers (CTCT) 
**M5. Read-For-Ownership (RFO) 


**M6. Split transactions (requires duplicate tag array) 


**M7. Memory cycle abort (after MHITM #) 
Ms. LOCK # protection 


**M9. LOCK # de-assertion (for back-to-back Intel486 DX CPU locks) 


**M10. CPLOCK# (Intel486 DX CPU only) 
**M11. Snoop during LOCK # 
**M12. Multiprocessor Interrupts 


Pins 

: RESET,HOLD,CAHOLD 
CAHOLD, FSIOUT #,FLUSH #, SYNC # 
PCHK #,BERR 

CLEN1 :0,RDYSRC,BRDY # 

CRDY #,MBRDY # 


BGT #,CADS # ,MBRDY # 

BGT # ,CNA#,MEOC #,CRDY # 
‘  CNA#,MALE 

BGT # 

KWEND # ,MKEN # 


MBRDY # 

INT(R), FLUSH # 
TCK,TMS,SLFTST# 
CW/R#,CADS# 
SNPSTB#,SNPCLK,SNPCYC# 
_ MHITM#,SNPADS# 


Pins 

SWEND#, MWB/WT # 
MAOE # 

SMLN # 
DRCTM#,MBAOE # 
PALLC # ,.DRCTM#,MFRZ# 
CWAY 

MHITM# - 

KLOCK # ,CAHOLD,SNPCYC # 
KLOCK# 

CPLOCK# 

KLOCK # 

INT,NMI(BERR) 


(for Message-Based Interrupts or TLB shootdown) 


** = optional and implementation dependent 
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tion. The circuit feeding RESET to these chips should 
keep it active at least 16 CLK periods. “Temporary 
strapping” means including RESET or ARESET in 
the logic equation for the pin. The multifunction pins 
are indicated with brackets [ ] below: 


1860 XP CPU pins: 
PEN#, FLINE#, HOLD 
Intel486 DX CPU pins: 
RDY #, BOFF#, BS8#, BS16#, HOLD, FLUSH 
82495 pins: 
CFG3, CFG2[KWEND #], CFG1 [SWEND #], 
CFGO [CNA #], CPUTYP[HITM #], 


FPFLDEN[FPFLD #], NCPFLD # [FLUSH #], 
SNPMDI[SNPCLK], C490LDRV [BGT #], 


MEMLDRVISYNC#], SLFTST#[CRDY #], TEST, 


HIGHZ# [MBALE], CACHE# (NOTE: the 
FPFLDEN pin is defined for Intel486 DX CPU as 
PLOCKEN[CPLOCK #]. The 82495XP does not use 
CFG3 for configuration in i860 XP CPU systems.) 


82490 pins: — 


MTR4/TR8 # [MSEL #], MX4/MX8 # [MZBT #], 
MSTBMIMCLK], MEMLDRV[MFRZ¥] PAR#, 
MOCLK, (BOFF#, HITM#) 


Intel486 DX CPU: The “unused” Intel486 DX CPU 
inputs (RDY #, BS8#, BS16#, BOFF#) with 82495 
should be connected as described in the Intel486 DX 
CPU Chipset EDS. 


The Intel486 DX CPU FLUSH # input should be tied 
up, unless the system requires FLUSH messages from 
the M-bus to be interpreted. Then the MBC must assert 
the FLUSH# inputs to both Intel486 DX CPU and 
82495, because 82495 does not do back-invalidates to 
the Intel486 DX CPU for FLUSH#. During RESET, 
the Intel486 DX CPU FLUSH# input must be kept 
high to avoid putting the CPU in tristate-output-test- 
mode (Intel486 DX CPU Data Sheet Section 8.4). 


i860 XP CPU: The i860 XP CPU input PEN # (Parity 
trap ENable) must be strapped high unless the memory 
data bus feeding the 82490s always contains good pari- 
- ty and the 1860 XP CPU system uses 2 82490s in parity 
mode; in the latter case, strap PEN# low. HOLD 
should be strapped low and FLINE# strapped high, as 
those features cannot be used with 82495. 


82495: The multiplexed 82495 pin FPFLDEN 
[FPFLD#] becomes an output after RESET, so the 
PAL or ASIC which creates FPFLDEN must float it 
as soon as RESET =0. The same multiplexing applies 
to Intel486 DX CPU mode, where the pin is named 
PLOCKENI[CPLOCK #]. Likewise, the multiplexed 
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input FLUSH#[NCPFLD#] should be driven high 
the same clock that RESET falls, to prevent an unnec- 
essary 82495 cache flush. In Intel486 DX CPU sys- 
tems, the 82495 input CACHE# must be tied low and 
HITM#[CPUTYP] must be tied LOW, as it signals 
CPUTYPE to 82495. | 


82490: The 82490DX inputs HITM# and BOFF# 
must be tied high in an Intel486 DX CPU system, as 
they exist to support the i860 XP CPU writeback 
cache. With an i860 XP CPU, the 82490XP input 
BOFF # comes from 82495XP but HITM# from i860 
XP CPU feeds 82495XP and 82490XP. 


The 82490 input MOCLK must also be tied low or to a 
delayed version of MCLK, if clocked-M-bus mode is 


used. This is because the 82490 senses the state of 
MOCLK after RESET ends—if MOCLK stays low, if) mm 
the 82490 uses MCLK to drive MDATA. If MOCLK (7 4s 
toggles after RESET, the 82490 will use MOCLK to [=emmamnaM 


switch output data. Using a delay-line externally to the 
82490 to generate MOCLK from MCLK allows the 
design a longer hold-time at other receivers of MDA- 


TA in the system. For a clocked-M-bus (non- synchro- 


nous to CLK), the undelayed MCLK should be con- 
nected to the 82495’s SNPCLK input and should be 
toggling during RESET to tell the 82495 to snoop in 
clocked mode. 


During RESET, the 82495 and 82490 will float the bi- 
directional lines they share with the CPU, such as 
CDATA and A31:A3. Thus driver contention is avoid- 
ed. The RESET input should be synchronous to CLK 
and deasserted to the 82495, 82490s, and CPU at the 
same time, to assure that the configuration eonuols get 
properly passed between them. . 


For Intel486 DX CPU resets, refer to Raavseonel 


for the sequencing of HOLD, HLDA, CAHOLD, and 
RESET required to reset only the processor without 
destroying 82495 cache contents. For that. purpose, a 
separate RESET line is advised for the CPU and 


-82495/82490. The CPU RESET line must be wired to 


the WRMRST input of 82495, to force 82495 to assert 
the BRDY1# input to the CPU during a reset of CPU- 
only (the CPU uses the BRDY1 # input during RESET 
to know of the 82495’s existance). The HOLD input of 
the Intel486 DX CPU and 1860 XP CPU processors 
should be kept low during normal operation. with the 
82495, because floating the processor outputs may yield 
undefined 82495 behavior. 


FLUSH# (and SYNC#) of caches requested by soft- 
ware must be decoded from the 82495 outputs CM/ 
10#, CD/C#, and CW/R# (+001) and latched 
BE3-—0# from the CPU. BE3-0# values of 0111 or 
1101 should activate the 82495 FLUSH # input, as the 
Intel486 DX CPU outputs them in response to the 
INVD and WBINVD instructions, respectively. Synch 


~ and flush commands may also come from the bus as a 
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message in a multiprocessor system. The 82495 is smart 
enough to allow assertion of FLUSH# or SYNC# at 
any time, and will delay the beginning of the flushing 
action until all current CPU and M-bus cycles have 


completed. The inputs are edge-sensitive. If the bus de- 


fines cache flush messages, the MBC may activate the 
Intel486 DX CPU FLUSH# input as well as the 
82495’s in response to bus message decodes. : 


Bus Error or Timeout Detection logic in the MBC can 
use the CPU’s PCHK # output or other M-bus-specific 
signals to detect errors. Note that the assertion of 
PCHK # will occur near the time of the error on the 
_ M-bus ONLY for non-cacheable reads or 82495-cache- 
miss reads.. For 82495-hits and CPU-idle cycles, 
PCHK # may arise due to a floating or erroneous CPU 
data bus value transferred on the M-bus much earlier. 
PCHK # must be ignored by the MBC except during 
the CLK after data transfer to the CPU was signalled 


_. by the MBC’s CPU BRDY#, because PCHK # indi- 


cates i860 XP CPU bus parity status at all times, not 
just during clocks of BRDY# activation. The proces- 
sor inputs INT, BERR, or NMI can be asserted by the 
MBC to signal errors. To detect errors originating in 
the CPU or 82490 upon a write(back), the MBC can 
check parity on the 82490 MDATA pins or on. the M- 
bus. | 


If the memory bus includes a retry protocol, the MBC 
bears the responsibility to implement it, because the 
82495 will not retry accesses. For a pipelined MBC in- 
terface when the retry occurs after CNA# to the 
82495, the MBC must latch the address and other con- 
_ trols (CW/ R#, CM/IO#, etc...) from the 82495 to use 
in retries. Retry should be triggered by signals other 
than the CPU PCHK # output, because the CPU data 
transfer cannot be retried although the M-bus transfer 
can. . 


The 82490 can restart a burst data transfer (for the case 
of an error detected after the first MBRDY# but be- 
fore MEOC# and before CRDY#). To restart the 


82490, the MBC must deassert MSEL # for at least ] . 


MCLK. 


While parity is supported by the 82495 and 82490, 
ECC (Error Correcting Codes) cannot conveniently be 
used within the cache. ECC can be implemented on the 
memory system, but no loads are permitted on the 
CPU-to-82495/82490 interface \ wires for error ene 
logic. 


Scenarios requiring MBC ian are 

1) CPU based requests (“Master” mode): 
e 82495 cache read miss (and line fill) 
© 82495 cache write miss 


© Non-cacheable CPU read (including i860 XP CPU 
-pfid) 


e Writethrough (to S-state line) or -Non-cacheable 
CPU write | 
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e I/O reads and writes © 


¢ LOCKed reads and writes (will be readthrough or 
_ writethrough) 


2) 82495 based requests “Master” mode): 

e Allocation due to write-miss (line fill) 

© Replacement writebacks 

° SNPADS¥#¥ writebacks 

3) Requests from other masters (“Slave” mode): 
© Snooping of DMA accesses | 


e Snooping of accesses of other CPUs (in a miltipro- 
cessor) 


¢ Bus-specific requests, like interrupt messages, reset 
requests, cache flushes, configuration registers, ID 
registers, timeout detection, BCRR Oa ee pemicn 
TLB shootdown 


Transfer Tracking 


Tracking of transfers on the M-bus and CPUbus is re- 
quired of the MBC during all of the above scenarios. 
This tracking (counting) of transfers involves activating 
BRDY # the correct number of times for the CPU and 
MBRDY # (a possibly different number) for the 82495 
and 82490. Transactions on the CPUbus which must be 
MBC-controlled can be 1, 2, or 4 data transfers, decod- 
ed from the BLE#-latched CPU pins: 


Intel486 DX CPU: BE3- 0#, PWT, PCD 
1860 XP CPU: BE7-0#, PWT, PCD, LEN, 
CACHE# 


and from the 82495 pins CW/R#, MCACHE#, 
RDYSRC (and CLEN1:CLENO for Intel486 DX CPU 
mode). 


See [82495 /490D8] fora completed definition of the en- 
codings. The BRDY # activations must be done only if 
RDYSRC= 1, and always correspond to the first 1, 2, 
or 4 MBRDY #s for the 82490-M-bus interface. The 
number of MBRDY #s always exceeds or is equal to 
the number of BRDY YS even for a 128-bit M- bus. 


Bursts for line fills and writebacks on the CPUbus al- 
ways are 4 transfers, but with some 82495 configura- 
tions the M-bus is 8 transfers. The addresses are nonse- 
quential when the first access is not at the zeroth word 
of the line. The addresses corresponding to each 
BRDY # and MBRDY # follow these rules: 


1) CPU burst addresses wrap at CPU line length. 


) When the line address is odd (A2=1 for 4-byte bus; 

A3=1 for 8-byte bus; A4= 1 for 16-byte M-bus), the 

' next address transferred on CDATA and MDATA 

is the LOWER address (eg., 3 followed by: 2). The 

odd-first-then-even pattern continues for all transfers 

of the burst. | This order optimizes interleaved 

_ DRAM systems, and om to pow the M-bus.and 
CPUbus. 
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3) 82490 bursts on CDATA wrap at CPU line length. 
82490 MDATA burst addresses wrap at 82490 line 
length. For example, a linefill with LR = 4 and a first 
Intel486 DX CPU address (A5:A2) = E, 
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© 82490 CDATA ordering is EF C D 


@ 82490 MDATA ordering is CDEF 89AB 4567 0123 
(128-bit M-bus) OR EF CD AB 89 67 45 23 01 (64- 
bit M-bus) 


For LR=2 (Line Ratio of 82495 to CPU) and CPUbus width = M-bus, below are the burst orders. Each address 
corresponds to one 4-byte transfer (for Intel486 DX CPUs) or 8-bytes (for i860 XP CPU). Time is increasing left-to- 


right: 
First Address: 0 


CPU transfers: O12 3 


-M-bus transfers: 01234567 


First Address: 2 
CPU transfers: 


2 | 
M-bus transfers: 2 


3 
3 
First Address: 4 
CPU transfers: 4 5 
M-bus transfers: 4 5 
First Address: 6 
CPU transfers: 


6745 
M-bus transfers: 6 745230 1 


0 | 
016745 


First Address: 1 
1032 
10325476 


First Address: 3 
3210 
32107654 


First Address: ‘5 
5476 
64761032 


First Address: 7 
7654 
76543210 


For LR=2 and M-bus = 2*CPUbus width (both buses using 4 transfers), 


First Address: O 
CPU transfers: 0123 
M-bus transfers: 01 23 45 67 


First Address: 2 , 
CPU transfers: 230 1 
M-bus transfers: 23 01 67 45 


First Address: 4 
CPUtransfers: 4 


567 
M-bus transfers: 45 67 


01 23 


First Address: 6 
CPU transfers: 6 7 4 5 
M-bus Transfers: 67 45 23 01 


The remaining transfer orderings for other LR values 
can be generated similarly, as an exercise for the reader. 


For requests originated by the 82495, the MBC must 
ignore the CPU pins (CACHE#, LEN, PWT, PCD, 
PCYC, CTYP, and BE7# —BEO#). These requests are 
writebacks, allocations, or linefills. Also the MBC must 
prevent the transfer of those signals to the M-bus for 
82495 requests—for example, it must force all BE7 #—- 
BEO# active during writebacks. The 82495 based re- 
quests can be recognized by: | 


RDYSRC=0 .AND. MCACHE#=0 (for write- _ 


backs, linefills, allocations) 
RDYSRC=0 .AND. MCACHE#=0 
MKEN # = 0 (for linefills, allocations) 


AND. 


First Address: 1 
1032 
01 23 45 67 


First Address: 3 
3210 
23: 01.67 45 


- First Address: 5 
5476 
45 67 01 23 


First Address: 7 
7654 
67 45 23 01 


For posted write requests (RDYSRC=0 and 
MCACHE # = 1), the length is 1, 2, or 4 transfers and 
the MBC must heed the BLE#-latched BE7-—0#, 
LEN, and CACHE#. 


Clock Boundaries and Synchronization 


To optimize performance, the 82495/82490 allow to- 
tal/decoupling of the CPU clock at 50 MHz from the 
M-bus clock. While both the CPU and M-bus could 
run at 50 MHz, the physical size of the M-bus would be 
severely constrained. Future faster versions of CPU and 
82495/82490 would make a synchronous M-bus even 
less feasible. However, with a 100% synchronous inter- 
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' face, little time is lost:in relaying requests from the 
82495 CADS# to the M-bus, and in Pansenne data 
from the M-bus to the CPUbus. : 


Yet with careful design, a slower M-bus such as 


33 MHz can handshake with a 50 MHz 82495 with | 


only a couple. of clocks spent on’ synchronizing.. Fur- 
thermore, the transfers requiring synchronizing are 
fairly rare uncached cycles, cache misses, and snooping. 
CPU performance is improved further because 
82495/82490 always post writes destined for the 
M-bus, allowing the CPU to continue processing upon 
write cache-misses and non-cacheable writes. 


Most of the 82495 operates on the CPU CLK. Only the 
snooping control inputs. operate on another clock, 


=~, called SNPCLK (SNPSTB#, SNPINV, SNPNCA). 
~: SNPCLK can be the same as the MCLK controlling 
~~. 82490 MDATA. A SNPCLK can be used with 82495, | 


~ even if the 82490 is strobed without an MCLK. All 
82495 outputs, including snooping results (MHITM #, 
MTHIT#, SNPCYC#, and SNPBSY #) remain on 
the CPU CLK. 


The 82490 operates half in the CPU CLK domain and 
half in the M-bus domain. While no control signals flow 
through 82490 between memory and the CPU, 82490 
implements a flow-through: data connection of CDA- 
TA to MDATA. Synchronization of the 2 DATA 
paths is unneeded, as the control signal MBRDY # gets 


synched by the MBC to the CPU clocked BRDY#. 


The MBRDY # and BRDY #. inputs control multiplex- 
ers inside 82490 to choose which part of a line-fill or 
write is transferred to/from the bus. The MDATA in- 
put latches are closed on MCLK (or MISTB for non- 
clocked operation), and CDATA input latches are 
closed with CLK. 


MCLK or SNPCLK 


Neither 

MRESET. = 
YBGT# — 
YMEOC # — 
YCEOC# —> 

_ MBRDY# » — 

-  MSWEND# =’ <~MSWENDA -> 
MADS# ~ a <— 
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If MCLK = CLK at 50 MHz, approximately 1.5 CLK 
periods are required to transfer data through the 82490, 
including 82490 propogation delay (15 ns) and setup 
time to both the 82490 (5. ns) and CPU (7 ns for i860 
XP CPU “CMOS” levels). The MBC must assure data 
setup time at the CPU DO-D31 (D63) pins to the ris- 


_ ing edge of.CLK for the cycle of BRDY*#¥ assertion 


during reads, based on the propogation delay from 
MDATA to CDATA listed in the 82490 AC timing 
specs. Writes are not flow-through, as 82490 always 
buffers the write-data and later 82495 gives CDTS # for 
the write. 


Most of the MBC-to-82490 signals are sampled by 


Beaze with MCLK, except for BRDY # and CRDY #: 


MBC <— 82490 Signals 


MCLK , CLK 
MBRDY # BRDY # 
MFRZ# CRDY # 
MZBT # 

MDATA CDATA 
MSEL# 
MEOC# 


MDOE # (asynchronous to both clocks) 


The MBC must be partitioned into an MCLK side and 


a CLK side. Fortunately, the CPU-side of MBC passes 
only a few signals to the MCLK side, and visa versa. 
The signals listed below from the dual-i860 XP CPU 
MBC design in Appendix C must go through a syn- 
chronizer. Refer to the Appendix for signal definitions. 
In the following diagram, a right-arrow (—> ) identi- 
fies synchronizing to. CLK, while a left-arrow (<— ) 
means synchronizers on MCLK: 


Clock Domain of the Signal: 


CLK 

RESET 

BGT # 

CRDY # 

BRDY #__maybe 

BRDY #__maybe 

SWEND # 

CADS # Or. SNPADS # OF. CDTS# 


The signals MK WEND # and MNA# mise, also need synchronizing to CLK, if they are derived from M-bus 


responses. 
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Two TI 74484374 “Dual-Rank Synchronizer” chips 
(Figure 7) are used to transfer critical signals between 
clock domains, while avoiding metastability. This 20- 
pin DIP has one clock input and 8 pairs of flip-flops. 
Thus each of the 8 “Q” outputs reflects the value of its 
“D” input after 2 clock periods. One chip is clocked by 
CLK and the other by MCLK. If fewer than 8 signals 
need synchronizing, chips such as the Signetics 
74F50728 or Intel’s 85C220 EPLD can combine syn- 
chronization with other functions [Ham90]. 


For an asynchronous or strobed memory bus, M-bus: 


signals (such as MBRDY#) get delayed by the syn- 
chronizer for 2 CLK periods before the 82495 can see 
them. For a clocked (but not by CLK) M-bus, 82495 
outputs (such as CADS #) get delayed by 2 MCLKs by 
the other synchronizer before the M-bus sees them. 


The following 82495 signals are defined as ‘“‘asynchro- 
nous”, meaning that no external synchronizer is re- 
quired: 


° FLUSH#, SYNC# 
° MALE, MBALE 
° MAOE#, MBAOE# 


Many signals can cross clock boundaries without syn- 
chronizing, because they will be ignored until corre- 


suse Flop 
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sponding status signals such as SWEND# and 
CADS# have been synchronized by the MBC. Thus 
they will be stable when sampled: 


¢ MWB/WT#, DRCTIM#, MTHIT#, MHITM# 
(sampled when SWEND #) 


© RDYSRC, KLOCK#, CPLOCK#, CW/R#, 
CD/C#, CM/10#, MCACHE#, BE7:# (sampled 
when CADS#) 


Other signals do not cross clock boundaries, but remain 
within the MBC CLK logic: 


© CNA#, PALLC#, CACHE#, LEN, PCD, PWT, 
CTYP, PCYC, MFRZ# . 


Synchronizer Delays 


To avoid lost time due to SyRCAIONZE delays, the fol- 
lowing options exist: | 


1. Pipeline the 82495/MBC interface. This hides the 
delay in synchronizing CADS# to its MCLK coun- 
terpart MADS#. 

2. Define the M-bus protocol so that MBRDY # pre- 
cedes MDATA by 1 MCLK for reads. Thus the 2 
CLK delay in creating BRDY# from MBRDY # is 
hidden. Likewise define MSWEND# to precede 


> 19 


<— (1/4 of a 74AS4374) 


240957-7 


Figure 7. Synchronizer Hardware and Waveforms 
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-MHITM # and MTHIT# by a CLK, by eaeatine 
~“MSWEND# from SNPCYC#.. 


3. Keep the snooping signals (SWEND £, MHITM#; 
-MTHIT#, SNPINV, SNPCYC#) which flow be- 
tween 82495s on the same CLK, so that no synchro- 
-nizers enter the snoop path. This is feasible only for a 
small number of physically proximate CPUs. 
4, Synchronize the snooping feedback signals from the 
M-bus (MSWEND #, etc...) only at the destination. 
~ They will be asynchronous to MCLK, Anson 
with the individual CLK of their source. 
5. Avoid MCLK, using a strobed-only M-bus. Strobed 
buses appear in single-CPU systems with an un- 
clocked DRAM interface. 


6. Activate MEOC# to 82490 as soon as possible after 
the lat MBRDY #. MEOC# allows 82490 to begin 
the next data transfer without waiting for CRDY # 

~ synchronization. 


| BRDY # Generation 


Below are recommended: sequences of the 82490 and 


CPU burst-transfer ‘““Readys” for CPU reads, assuming | 


the bus widths are equal. Sequences with more clocks of 
delay are acceptable but suboptimal. 


1) Synchronous M-bus (MCLK = CLK): MBRDY# 
precedes BRDY# by 1 or 2 CLKs, to allow propo- 
gation time for data through the 82490 and setup 
time at the CPU pins. 


2) “Divided Synchronous” M-bus (e.g., CLK =50 
MHz, MCLK=25 MHz, _ skew controlled): 
MBRDY # precedes BRDY # by 1 or 2 CLKs. The 
BRDY# state machine must ignore MBRDY#¥ in 
the CLK period after it was sampled active. 


3) Other Clocked M-bus (MCLK <_ CLK): 
_ MBRDY# must go through a dual-rank synchroni- 
zer latch (such as the TI 74AS4374) clocked by 
CLK to produce BRDY#. That means 2 CLK de- 
lays between MBRDY# and BRDY #. MBRDY # 
MUST remain active for at least 1 CLK period to 
assure that the synchronizer latched it active. To 


avoid one MBRDY*# getting wrongly sampled ac- 
tive twice, the BRDY # state machine should ignore | 


any second MBRDY*# in the CLK period after it 
was sampled active. 


4) Strobed M-bus: here MISTB# must go through the 
synchronizer with 2 CLK delays to create BRDY #. 
An edge-sensitive strobed M-bus avoids the problem 
of wrongly converting one M-bus transfer to 2 
BRDY #s, as a level-change marks each M-bus 
transfer. 
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When M-bus width is greater than CPUbus width, the 
above rule holds only for the first BRDY #. Successive 
BRDY # activations follow the rules below: 


e M-bus = 2*CPUbus: 2 BRDY#s occur for each of 

the first 2 MBRDY #s. The second BRDY # should 
occur 1 CLK after the first. The third BRDY # can- 
not begin until after the second MBRDY#. | 


© M-bus = 4*CPUbus: 4 BRDY #s occur for the 
~ MBRDY #. The last 3 BRDY #s can occur immedi- 
ately in the 3 CLKs after the first BRDY#. _ 


For asynchronous systems (MCLK < CLK), high per- 
formance design choices are: 


M-bus width 
M-bus width 


2 * CPUbus width OR 
4 * CPUbus width 


I ll 


The wider M-bus allows each M-bus transfer to satisfy 
2 or 4 CPU transfers, so that the CPU is not starved for 
data during a line fill. The 82490 switches its CDATA 
outputs to the next value the CLK after BRDY # asser- 
tion by the MBC for the current value, so the MBC 
controls the provision of data to the CPU on linefills. 


A low-cost MBC can use M-bus width = CPUbus with 

a slower MCLK, by converting the first MBRDY # to 
BRDY # through a synchronizer. The last 3 BRDY #s 
can be asserted by MBC after completion of all the M- 
bus transfers. That will allow the CPU to proceed exe- 
cuting after receiving the first datum, which is the one 
it was waiting for in most cases. Alternatively, the M- 


bus protocol can be defined so that no idle clocks occur 


on M-bus after the first MBRDY# and the MBC 
knows by counting CLKs when to assert successive 


_ BRDY#s. 


Shown in the following timing diagrams are data trans- 
fers on both buses for CPU reads. Although they as- 
sume no dead clocks (wait states) during the M-bus 
burst, dead clocks are allowable. 


Writes are not shown in the diagrams because the MBC 
never supplies the CPU BRDY#s for burst writes. 
RDYSRC=0 for most writes, and the 82495 controls 
the CPUbus transfers. The exception to this rule is I/O 
writes, which 82495 does not post; for I/O writes, the 
MBC supplies BRDY # to the CPU, but I/O accesses 
are always 1 non-bursting transfer. 
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CLK 
CDATA 
BRDY # 
MDATA 


MBRDY # 


240957-8 


CLK 


CDATA 


BRDY # 
MDATA 
MBRDY # 


MCLK 
240957-9 


Figure 9. Data Transfers, M-bus Width = CPUbus Width. CLK/2 < MCLK < CLK. 
Note the starvation on the CPUbus (extra wait state) 


CLK | 
~ CDATA [ | 
ae REE ee 
MDATA | ss} —X_0,1 12,3 
a) eee 
MBRDY# ¢ 44: 


* 


MCLK 
240957-10 


CLK 


CDATA 


BRDY # 


MDATA 


MBRDY # 


MCLK 
240957-11 


Figure 11. Data Transfers, M-bus Width = 4*CPUbus 
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Pipelining | 

Pipelining the MBC-to-82495 interface reduces latency 
by allowing the MBC to arbitrate for the next M-bus 
transaction while the first is proceeding. If the M-bus is 


also pipelined, it allows the snoop for the next to begin 
during the data transfer for the first. 


Signals used in pipelining the 82495 are CNA#,. 


BGT#, MALE, KWEND#, SWEND#, and 
CDTS#. The 82495 will not listen to CNA# until the 
clock of BGT # activation. Also, KWEND # activation 
sometimes allows the 82495 to create a next cycle, such 
as an allocation after a write miss. MALE deassertion 
allows the memory address to remain at the value for a 
previous request, even though the next request CADS# 


and other control signals have already occurred in re- . 


sponse to CNA #. The MBC must latch the 82495 out- 
‘put signals which change in response to CNA #, until 
their status no longer matters to ongoing cycles. _ 


Note that 82495 and 82490 automatically pipeline the — 


CPUbus interface to i860 XP CPU by activating NA 
and latching address and data. z . a 


Pipelining the M-bus itself involves sending a next ad- 
dress for snooping and DRAM access while data trans- 


fer from the current address still remains incomplete. 


This increases bandwidth by overlapping slow DRAM 


| 72 13 | 4 5 
MDATA |XXX_AO_X_At_ X_ A2_X_A3_) 
MBRDY# 

MADDRESS 


MADS# 


MCLK ie 


MDATA 
" MBRDY # 
MADDRESS | 
MSWEND# 


~ MADS# 
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access with bus data and address transfers, as in the 
i860 XP CPU pipelined bus. 


While each 82495 allows only a one-stage deep pipeline, | 
the M-bus can have a deeper pipe as requests from sev- 
eral different 82495s can be in progress. The number of 


‘stages in the M-bus pipe should match memory access 
latency. For example, use 3 stages for a 240 ns mem- 


ory with a 120 ns bus MADS#-to-MNA# (and 
SWEND #) time, so that a second and third request get 
issued during the memory latency of the first. Pipelin- 
ing does not imply that multiple snoops are ongoing 
waiting for SWEND#; that is a split-transaction bus, 
defined in a later section. Thus a quick SWEND# 
turnaround time speeds a new request onto the M-bus. 


The advantage of a pipelined bus using a 4-transfer 
burst is illustrated in Figures 12 and 13. Assumed is a 
fast memory access time of 4 MCLKs. With a slower 
access time, pipelining becomes more important for 
maintaining data bus bandwidth; even with the 
4-MCLK access, the unpipelined data bus is idle 50% 
of the time. — _ | 


M-bus Arbitration 


If the M-bus possesses more than one master, each 
MBC must arbitrate to gain control of the M-bus when- 


Bins Boe Bor Boe 


OOOO 


4.4.4466660660404 
ha en Wa 


7 8 9 


10 | 11 | 12 
CECE c3 
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Figure 13. Data Transfers for Pipelined M-bus 
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ever its 82495 activates CADS#. No arbitration logic is 
included in 82495 nor 82490, except for the ability to 
float. (Hi-Impedance) the 82495 and 82490 M-bus out- 
puts via the MAOE# and MDOE# signals. The 
BGT # and MAOE# inputs to 82495 are from MBC 
arbitration logic. The simplest systems can use a 
HOLD/HLDA/BREQ protocol like the i860 XP CPU 
and Intel486 DX CPUs themselves, which is central- 
ized arbitration. 


Expandible buses like Futurebus+ and Multibus-II use 
distributed arbitration to allow a variable number of 
masters. Bus parking (retaining ownership of the M-bus 
until another master requests it) is advised to avoid un- 
necessary delay. 


The “restricted backoff protocol’ of 82495 requires 
that it be granted the bus for a modified-line writeback 
after it activates MHITM#, before it will snoop or ini- 
tiate any other transactions. The snooping MBC must 
relinquish the M-bus immediately after the CRDY # of 
the M-line writeback so that the original owner can 
complete its work. 
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Sequencing 


A typical sequence of request and response signals be- 
tween the 82495 and MBC is shown in Figure 14. The 
“SL” entities (CPUsy, 82495sy, 8249031, MBCsy) are 
for another CPU/Cache core, the SLave(s) who snoop 
when the master CPU owns the bus. No DMA (such as 
EISA or MCA) interaction is shown, but it will be simi- 
lar to the CPU responses, except that no writeback will 
be done by DMA. Time increases downward. A minus- 
sign prefix means deassertion. 


The arbitration for the M-bus shown in the diagram 
assumes a HOLD/HLDA protocol like the CPUs use. 
That is a primitive centralized scheme, suitable only for 
a small number of processors. 


The sequencing may vary from that shown; for exam- 


ple) MSEL# may precede CDTS#. MADS#, 
MW/R#, MA31:3, MM/IO#, 
MBE7-0# would all be valid simultaneously. The sig- 
nals in parentheses would be asserted only in the case of 
a M-line hit in the snooper, and some signals for that 
writeback and possible cache-to-cache transfer are not . 
shown. 
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CPU 82490 82495 


< MAOE# | 
< MALE ad 


| MW/R#, ete > ee | SNPSTB#. > 


| SNPINV# > 
<MKEN# MRO# |. 


< KWEND# 
| < BGT# CNA#* 


| | | _ | <SNpBsy#" 
< MSEL#*, MDOE# < SNPCYC# 
. | < MHITM# 
< MWB/WT# <. MTHIT# 


CPU 82490 82495 MBCo, 824956, 824905) CPUs, 


i< (—MAOE#) < (MADS#) < (SNPADS#) f 


ae: 


< MBRDY# 


es 


CPU 82490 82495 (B “M=BUS MBCg, «82495, 82490.) CPUs, 


| 240957-14 
= Signal might occur sooner or not at all, depending on the type of request and bus protocol. 
= These lines of the sequence occur only on a Hit-to-Moaified (MHITM#) 


Figure 14. MBC Signals and Protocol Layers 
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Flowchart of MBC Algorithm (not applicable to all cases) 


! 


M-bus already owned by this MBC ? 
N | 


Arbitrate for bus. 


| 


Enable 82495 to drive address to bus (MAOE#, MALE). 
Echo other request parameters (MW/R#, MCACHE#, otc...) to the bus. 


Assert BGT#. 


| 


Determine cacheability, assort pins KWEND#, MKEN#, MRO#. 
Latch control signals (MW/R#, etc...). 
Assert CNA# to invoke next 82495 request. 


: 


MHITM# from other masters ? 
Y 
Vv 
Abort Memory cycle. Do Cache-to-cache transfer. 


| 


Wait for CDTS# (before beginning data transfer). 


! 


Forward snoop responses to master 82495 © 
| using SWEND#, MWB/WT#, DRCTM#. - 


Signal burst transfers of M-bus via MBRDY#. 
lf RDYSRC =1, echo burst transfer acknowledgments on BRDY.#. 
Compensate for LR<>1 by stopping BRDY# assertion when CPU line filled. 


! 


Notify 82495 and 82490 of completion of transfer via MEOC# and CRDY#. 


! 


New CADS# ? 


Relinquish bus ownership. 
Deassert MAOE# to re-enable snooping by this 82495. 


240957-25 
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Cacheability of each request must be determined by the 
MBC to prevent the 82495 and CPU from caching 
things like memory-mapped I/O device registers. The 
i860 XP CPU CPU samples its KEN # (Cache ENable) 
pin at the time of the first BRDY # for a-transfer or at 
NA#, whichever comes first. The 82495 offers more 
flexibility than the CPU cacheability indicators, by us- 
ing the KWEND# (cacheability Windown END) in- 
put to indicate validity of the MKEN# and MRO# 
pins. The values of MKEN # and MRO# are based on 


address decode, either locally in the MBC or from a | 


centralized decoder on the memory bus. For best per- 
formance, KWEND# should come as soon as possible, 
as it allows 82495 to decide what the next CADS# 
should be—for example, to begin an allocation for a 
write miss, or to start another writethrough. 


A typical implementation would activate KWEND# 2 
clocks after CADS#, using a PLD or fast SRAM to 
decode the upper bits of the address to generate 
MKEN # and MRO#. 


Note that KWEND#, SWEND#, and BGT# need 
not be asserted by the MBC for SNPADS# cycles 
(snoop writebacks), but it may be simpler to assert 
them always. 7 


Snooping 


Snoop handshaking (bus watching) is useful in a multi- — 


processor system, and may be needed in a uniprocessor 
system where the 82495 and CPU caches must be kept 
consistent with DMA accesses. The 82495 must snoop 


all DMA accesses to memory. The MBC sees requests | 


from DMA (or other processors) on M-bus and con- 
verts them to SNPSTB# activations to the 82495. The 
following scenarios are possible: 


¢ DMA (or other processor) read causes 82495 
MHITM #: 82495/82490 must writeback the modi- 
fied line to memory before the first DMA data 
transfer occurs (unless the DMA controller is capa- 
ble of re-trying the read. If the DMA can retry, then 
the 82495 writeback must cause the initial DMA 
access to be aborted.) The MBC can assert 
SNPNCA (SNooP Non-CAcheable Access) to the 
82495 for a DMA read, so that the 82495 knows it 
can keep the block Exclusive upon a hit. 


¢ DMA (or other) read causes 82495 MTHIT# but 
not MHITM#: MBC must assert the “shared” 
status line of the M-bus, if the bus includes such a 
line. | 


¢ DMA (or other) write causes 82495 MHITM#: 

_ 82495/82490 must writeback the modified line to 
memory before the first DMA data transfer occurs. 
SNPINV should be activated to 82495 to invalidate 
the line. 


_ @ DMA (or other) write causes 82495 MTHIT# but 
not MHITM#: SNPINV should be activated to 
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82495 to invalidate the line. Note that 82495/82490 


cannot “‘write snarf’—they do not absorb write-data | 
from the memory bus and merge it with current cached 
contents of the line. However, they can absorb a full- 
line writeback from the M-bus when doing a linefill of 
the same address (see the section on Cache-to-Cache- 
Transfers). 


Bus size adaptation can be done by the MBC, although 
it is not necessary in most systems. In an Intel486 DX 
CPU or i860 XP CPU system without an 82495/82490, 
an 8-bit device like a ROM can be used to contain code, 
and the CPU will automatically fetch at byte-width 
when the BS8# (Intel486 DX CPU) or CS8 (i860 XP 
CPU) pin is asserted. However, if a byte-wide ROM is 
used with an 82495/82490, adaptation of this byte in- 
terface is required from the MBC. 


If the ROM code is to be cacheable, the MBC must 
convert the 82495 line fetches at the ROM location to 
the appropriate number of byte-wide ROM reads. 
Latching transceivers must be employed at the 82490 
MDATA inputs or at the ROM output, to assemble the 
single-byte ROM reads into 4 (or 8) bus-width-wide 
transfers to the 82490s. 


If the particular M-bus protocol requires transfer 


widths shorter than the 82490 data width used, the ad- 
dress range requiring such transfers can be made non- 
cacheable to force 82495 and 82490 to use the width 
given in the request from the CPU. 


Bus size adaptation would also be needed to support a 
512kB cache on a 32-bit memory bus. In that case, the 
MBC must control transceivers and MBRDY #s to in- 
terface between the 64-bit 82490 MDATA path and the 
32-bit M-bus. 


Bus Signal Levels 


Redriving 82495/82490 signals to the M-bus (such as 
MDATA, addresses, and 82495 control outputs) can 
optionally be done by the MBC. If, the M-bus signal 
levels are not TTL, like ECL or Futurebus+ BTL 
(Backplane Transceiver Level), then appropriate trans- 
ceivers must lie between the M-bus and 82495/82490. 
Also M-buses with heavy capacitive loads should be 
redriven by transceivers, although 82495 and 82490 can 
tolerate loads of up to 100 pF. 


An additional advantage of buffering the 82495/82490 
signals with transceivers in a multiprocessor is that a 
“local M-bus” will exist between the chips and the | 
main system M-bus. That allows some local traffic from 
the CPU module to attached peripherals to avoid tra- 
versing the M-bus. Such peripherals might include an 
MPIC/CCU (MultiProcessor Interrupt Controller/ 
Concurrency Control Unit), a JTAG boundary-scan 
controller, or a time-of-day clock, as in the Sequent 
Symmetry multiprocessor. / 
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8.0 NBC FUNCTIONS FOR 
' MULTIPROCESSORS 


Multiprocessor cache designs have additional motiva- 
tions beyond the uniprocessor goal of reducing memory 
access latency. Reducing memory bus usage is especial- 
_ly important because the sharing of the bus creates a 
bottleneck. Thus multi-82495 systems need to minimize 
the number of transactions and make each one as short 
as possible. Large caches (256k or 512k) are recom- 
mended for multis, to keep the miss rate as low as pos- 
sible. 


In addition to the uniprocessor functions, an MBC ina 
multiprocessor must handle consistency with caching 
agents other than its own 82495. The multiprocessor 
MBC may also for performance reasons implement 
snoop filtering, cache-to-cache transfers, read-for-own- 
ership,.and split transactions. 


Snooping results from listeners (slaves) on the bus must 
be fed back to the master 82495 by the time SWEND # 
is activated, if the system uses writeback policy (write- 


34 DRCTM# 
MWB/WT# 
SNPSTB# 


DUAL - RANK Q 
bd SYNCHRONIZER 
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MADS# ~© 
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MINHIBIT# (MHITM#) 


MSHARED# (MTHIT#) 
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Figure 15. Creating Snoop Results from MHITM#, MTHIT#, and SNPCYC # 
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through requires no _ feedback). These results 
(DRCTM#, MWB/WT#) are translations of the 
slaves’ MHITM# and MTHIT # outputs. As shown in 


_ Figure 15, typically all MHITM# outputs would be 


wired-or via open-collector transceivers. Because slaves 
on the bus may be busy with CPU operations and back- 
invalidations, the snoop delay can vary. Thus a latched 
derivative of the SNPCYC# output of all 82495s 
would be wired-or to derive SWEND#. Alternatively, 
the MBC can count CLKs to generate SWEND #, us- 
ing the worst-case upper-bound of CLKs required. for 
all 82495s to snoop, but that makes all snoop windows 
long. 


Because 82495 will tolerate SWEND# arrival up until 
CRDY #, the M-bus data transfer for reads can overlap 
the snooping delay. The transfers (MBRDY #s) can oc- 
cur during snoop latency, and an MHITM # activation 


would cause the MBC to restart the transfer using fi 


82490’s MSEL¥ pin. 


If a 82495 linefill or writethrough hits a dirty line in 
another cache, the MBC cannot BACKOFF the 82495. 


CACHE 
TRANSFER 
AND 
READ FOR 
OWNERSHIP 
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Labeling that other cache “the dirty 82495,” and the 
initiating 82495 “the master 82495”. The master MBC 
must force a retry of the access after the dirty 82495 
dumps the line, but the master 82495 has no “‘Backoff 
and Retry” input pin. Rather, on a linefill the master 
82495 must see the data transfer as if it had come from 
memory. On a write, the master 82495/82490 data 
write must wait until the modified line from the dirty 
82495 has been dumped to memory. To do so, the mas- 
ter MBC can either: | 


1) Delay the corresponding MBRDY #s to the master 
82490 until the modified line is completely written 
into memory and read out of memory. That implies 

. the master MBC will remake the initial request to 

~ the memory controller after the writeback. 

OR oe f .2 

_ 2) Create a cache-to-cache transfer, so that the write- 

‘back data movements go directly into the master 

82490 over the M-bus. A later section describes 
cache-to-cache transfers. Such transfers are quicker 
than waiting for the entire modihed line to be writ- 
ten back to memory. 


Note that the 82490 can restart the data transfer for 
reads or writes, in the case of MHITM#¥ activation 
after the first MBRDY# but before MEOC# and be- 


fore CRDY#. To restart the 82490, the MBC must - 


deassert MSEL # for at least 1 MCLK. 


Snoop Window Time (the delay from MADS# to 
SWEND #) limits address-bus bandwidth. In the inter- 
val from the address on M-bus until the acknowledge- 
ment (SNPCYC#) by all listeners, no more requests 
(addresses) can be on the bus. This restriction is im- 
plied by: 
1) A typical M-bus has only one MSWEND# wire, 
which cannot be identified with the proper request if 
several requsts are outstanding. 


2) 82495 does not snoop between BGT# and 
SWEND #. 


3) 82495’s “restricted backoff protocol’. That protocol 
requires the M-line writeback to be the first transac- 
tion by any 82495 which generates MHITM#, and 
82495 cannot snoop anymore until it finishes the 
MHITM # writeback. : 


Data for read-misses cannot be transferred on the 
CPUbus until SWEND#, because the MBC cannot 
abort a CPU transfer after giving the first BRDY#. 
Thus the snoop window length influences CPU per- 
formance. Depending on the number of processors, bus 
speed, and memory speed, two scenarios arise from 
snoop window length versus memory access latency: 

1) SWindow < Memory Latency: SWEND#¥ precedes 
the MBRDY #s. If MHITM# occurs, the original 
memory access can be aborted and its MBRDY #s 
must be ignored. 
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2) SWindow > Memory Latency: data transfer on 
M-bus can proceed, with MBRDY #s causing 82490 
linefill buffers to advance. After SWEND#, the 
MBC can begin BRDY #s to the CPU and 82490 if 
MHITM # is inactive. If MHITM# is active, the 
MBC must restart the M-bus data transfers after (or 
during) the writeback from the modified snooper, 
and can begin BRDY #s immediately after the first 
MBRDY #. 


The typical snoop window in a multiprocessor using 
the hardware of Figure 15 is about 7 CLKs total snoop 
turnaround delay, shown in Figure 16: 


1 CLK for propagation delay of master’s 
: MADS# (to slave 82495s’ SNPSTB# in- 
puts) 7 

+ 0.5tol CLK for 82495 to internally latch | 
SNPSTB# and synchronize it to CLK. 


a aa | CLK for 82495 tag lookup and SNPCYC#. 
(or more, if 82495 is busy with SNPBSY #) 
+1 CLK to latch SNPCYC# into the MBC 
— Set/Reset flip-flop generating MSWEN: 
DA. 
7 ae CLK for MSWENDA apen-callector buff- 
er and settling time from all slaves. 
2 CLKs for MSWENDA to get through syn- 


chronizer (on the master MBC’s CLK) and 
inverter to generate SWEND# to a mas- 
ter 82495. 


The window total assumes that the slave 82495s’ one 
CLK delay from SNPCYC# until MHITM # is con- 


current with the synchronizer delay for creating 


SWEND# from MSWENDA at the master. Those 2 
CLKs can overlap with the next MADS# if it is asyn- 
chronously generated from MSWENDA. Shorter 


snoop window times can be obtained using duplicate 


external tags as explained later, but this is not trivial. 


Read for Ownership (RFO) protocols decrease bus traf- 


fic by avoiding the M-bus write which would occur 


upon a write-miss. That is, a write-miss would go to the 


bus, followed by a 82495 line allocation request for the 


missed area. With RFO, the MBC does not echo the 
82495 write request to the M-bus. Instead, it asserts 
MFRZ# to freeze the written data in the 82490 memo- 
ry buffer, and allows the subsequent 82495 allocate line | 
request to go to the bus. When the line data returns on ° 
the M-bus, MBC asserts DRCTM# to cause the 82495 
to mark the line as Modified (the memory system and 


other caching agents do not know of the original write 


miss, so they have invalid copies of the line). 


Signals which the MBC must use to do RFO are: 


1) PALLC# (Potential ALLoCate): from the 82495 
must be active on the write miss.If not, RFO cannot 
be performed. 


2) MKEN# and CRDY#: must be activated by the 
MBC for the write, to trigger the 82495’s subsequent 
allocation request 
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Master 82495 sees SWEND# 


Slave 82495s begin snoop (CLK) after internally 


synching SNPSTB ~ 
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Figure 16. Snoop Waveforms 


3) MFRZ#: must be activated by the MBC to the 
82490 at the time of the MEOC#: and CRDY# for 
the write. 


4) INVAL (memory bus Invalidate indication): must 
be asserted by the MBC during the allocate-read to 
force all other 82495s to invalidate their now-obso- 
lete copies of the line. Slave MBCs will assert 
SNPINV to 82495s. 


5) DRCTM# (DiReCt To Modified): must be asserted 
by the MBC during the SWEND # of the allocate, to 
make the 82495 put the line in M-state. 


' 6) MWB/WT#: must be 
SWEND # of the allocate. 


7) CPLOCK # (82495 Psuedo Lock in Intel486 Dx 
CPU systems): if active, the MBC must NOT do 
RFO, because 82495 will activate PALLC# only on 
the second of the 2 writes. If the MBC tried to RFO, 
it would merge only half of the data into the modi- 
fied line. 


asserted gunn: — the 


See [82495/490DS] for RFO information. 


Cache-to-cache transfers (CTCT) optimize the speed of 
consistency actions in a multiprocessor. For a read line- 
fill by a master causing an MHITM # from a slave, the 
writeback data movements go directly into the master 
82490 over the M-bus from the dirty 82490. For a 
_ write, Read-for-Ownership (RFO) is required for the 
CTCT. If RFO is not implemented, then the cache-to- 
cache option can be used only on linefill (read) misses. 
In fact, RFO makes every write-miss into a linefill. The 
82495/82490 do CTCT only on entire lines, not bytes 
or words. 
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For CTCT ona linefill causing -MHITM#, the MBC 
doing the writeback must initiate the writeback at the 
subline address of the initial read. Starting the write- 
back from the first word of the line is NOT acceptable. 


While CTCT is faster than re-reading the line after 
waiting for the dirty writeback, the latency will be long- 
er in most systems than for fetching lines from main 
memory. CTCT would actually waste time for such 
items as shared instruction pages. For non-written data, 
transferring from memory to a CPU i is probably faster 
than tranferring from another cache. So 82495 supports 
only M-line CTCT. ane: wricieer occurs unless 
MHITM #). ; 


Signals involved in CTCT are DRCTM#, MZBT#, 
MHITM#, -MBAOE#, and | MSEL#. See 
[82495/490DS] for CTCT information. ge 


Snoop filtering can be implemented by the MBC using 
the 82495 SMLN# (SaMe LiNe) output to reduce the 
latency for snooping. That is, SWEND# can be assert- 
ed immediately to the requesting 82495, if the 82495 
asserts SMLN # to indicate the current request is to the 
same line as the previous request. In that case, other 
caches already have checked this line. SMLN# must 
be ignored if the M-bus has been used by other agents 
between the 2 82495 requests. The M-bus protocol need 
not include a “‘non-snooped transfer ‘type”’ for the use 
of this feature, as the MBC can simply ignore the snoop 
sponses: from other MBC/ 82495 modules. 
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Split transaction (ST) memory-buses such as Future- 
bus+ prove valuable in high performance systems. An 
ST (also called ‘“‘connect/disconnect” or “packet 
switching”’) bus divides a single read request into a sep- 
arate address-transfer phase and a data-transfer phase. 
Thus the bus is.riot monopolized during the long laten- 
cy involved in accessing data across bus hierarchies. 
Writes typically are not split, as the data and address 
are available simultaneously from the writer. In a hier- 
archical bus, requests must be forwarded across bridges 
for the purposes of snooping and memory access at re- 
mote nodes, and the snoop latency may be long. Thus 
the bus should be freed between initial request and 
snoop-response for use in other transactions. 


The 82495 does not support ST directly. That would 
require snooping current cache contents and queue-up 
' possible writebacks, for the accesses from other bus 
agents between the time of the BGT# (the address 


phase) and SWEND# (end of the. address phase or | 


later). Also 82495 cannot writeback dirty data between 
SWEND# and CRDY# (end of the data phase) of an 
ongoing cycle; it cannot suspend a transfer for later 
resumption after a snoop writeback. 


CADS # _  BGT# SWEND # CRDY # 


EEE GAAAAAAAA DDDDDDDDDDD 


NN = No snooping by 82495 will occur in this area 
DD = Delayed response by 82495 to snoop requests 
| here. MTHIT# and MHITM# asserted immedi- 


ately, but writeback of dirty data Relayed © until af- 
ter CRDY # for ongoing cycle. 


82495’s inability to snoop during the NN period comes 
from the need to keep 2 addresses into the tags active— 
one for the outstanding 82495 request, whose tag must 
be updated at SWEND# based on MWB/WT# and 
DRCTM#, and one for the snoop inquiry. Further- 
more, any MHITM# on the:M-bus could not be easily 
linked to the request causing: the snoop if 2 snoops are 
outstanding. 


To support split transactions by snooping between 
BGT# and SWEND#, a set of tags external to the 
82495 can be implemented in the MBC. Those tags 


would replicate the contents of the 82495 internal tags, — 


listening to all memory bus requests. and responding 
with snoop results. Only when a 82495 state change (to 
Tor S) is needed will the 82495 be informed of snooping 


action—only then will the external tags relay the snoop | 


request to it. 


Duplicate tags provide quicker snoop turnaround be- 
cause no SNPCLK-to-CLK_ synchronization is re- 
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quired; the duplicate tags are in the SNPCLK/MCLK 
logic. While they are a high-performance option, they 
are costly and complex. 


Memory cycle abort is required in multiprocessors 
when a snooping 82495 activates MHITM # to signal 
that the memory’s copy of the data requested by anoth- 
er 82495 is obsolete. As explained above, the memory 
read or write must be INHIBITED until the writeback 


is done. Depending on implementation, the original ac- 


cess may need to be retried or abandoned. If CTCT and 
RFO are implemented, then abandonment is probably 
adequate. Although the complexity of aborting could 
be avoided by delaying all memory action until 
SWEND#, that would decrease performance. An 
M-bus signal such as “SIV” (System InterVene) or 
“MBOFF#” (M-bus Back OFF) allows the MBC of 


_ the snooper to tell memory to abort. 


If the M-bus is pipelined, there may be constraints on 


when the MBC can assert the “abort” signal to avoid 
cancelling the access in progress for the transfer preced- 
ing the one causing MHITM#.. 


Locking 
Locking of the M-bus’ using the 82495’s KLOCK # 


output is required to ensure atomic accesses for CPU 


| 


locks. For example, memory variables called sema- 
phores in a multitasking airline-reservation system pre- 
vent two processes from trying to update the same list 
of flight reservations simultaneously. A task would read 
the value of the semaphore in an uninterrupted read-— 
modify-write (RMW) sequence, asserting the CPU’s 
LOCK # signal during the RMW to block interrupts! 
(and block locked accesses by other processors to the 
same semaphore in a multiprocessor). If interrupts or 
other accesses were allowed during the sequence, two 
processes (or processors) might both read the sema- 
phore as “‘available” (zero) and both assume ownership, 
setting it to “unavailable” (nonzero). Then both might 
find the same empty seat and write their individual pas- 
senger’s name in the same seat location. In the end, 101 
passengers would have tickets for a 100-seat plane 


_ flight. 


The 82495 and i860 XP.CPU implement locks in a 
sequentially consistent, or serializing, manner. That is, 
all data loads and stores within the locked sequence 
occur on the external bus in the same order as they 
appear in the program. Also, all accesses in the pro- 
gram before the LOCK instruction are completed be- 
fore the first locked read or write, and all the locked 
reads/writes complete before other accesses after the 
locked sequence. This sequentiality is required by the 
semaphore example above, to prevent the CPU from 
updating the reservation list before it has obtained own- 
ership using the semaphore. 


1The CPU automatically blocks interrupts during the LOCKed sequence. The bus arbiter is responsible for blocking other accesses. 
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The MBC must serialize by ensuring all back-invali- 
dates from 82495 to the CPU have completed before 
activating BRDY # for any locked read or write. So the 
MBC must postpone locked BRDY #s until CAHOLD 
is‘inactive and SNPCYC# has been inactive at least 2 
CLKs (refer to [82495/490DS] section 5.1.1). 


Bus Lock vs. Address Lock 


The 82495 echoes the CPU’s LOCK # signal onto its | 


KLOCK # output, and forces all CPU accesses to go to 
the M-bus, even if they are 82495 cache hits. That guar- 
antees that other processors know of the LOCK and 
the accesses. The 82495 assumes a BUS LOCK, where 
all other processors are kept off the bus during 
KLOCK # activation. Most existing ‘“‘standard’’ buses, 
such as Multibus-II, have lock protocols which do such 
an exclusive lock. 82495 snoop behavior during asser- 
tion of its own KLOCK # is undefined, since it expects 
no other requests will be permitted then. The 82495’s 
KLOCK# can remain asserted for multiple cycles 
when used with the 1860 XP CPU, because the proces- 
sor allows up to 32 instructions inside a LOCKed se- 
quence. 


The 32-instruction i860 XP CPU LOCKed intervals 
may exceed 32 CLKs, as each instruction could take 
several clocks and cause a TLB miss (the intervals 
would be even longer if the i860 XP CPU did data 
cache line fills and line writebacks during LOCK #, but 
the 82495 prevents that by making KEN # = 1). Unfor- 
tunately, this. limits bus concurrency. When several 
82495s share a bus or interconnection network, per- 
formance would improve if a LOCK # from one proc- 
essor did not block all others from accessing memory 
and I/O. Multiprocessors based on the Intel486 DX 
‘CPU are not affected as severely by LOCK #, because 
its lock endures only a few clocks—two memory ac- 
‘cesses at most. 


To improve performance of locks in a multiprocessor, a 
scheme of ADDRESS LOCKING may be implement- 
ed. This non-blocking protocol allows other accesses to 
the bus and memory in spite of LOCK # activation, 
and requires only that no other CPU tries to access the 
same LOCKed address. If another CPU does try to 
access the same location, that second CPU must be 
stalled until the first LOCK is de-asserted. To ensure 
that the second CPU continues to snoop accesses while 
stalled, BGT # to it for its request must be delayed 
until the lock is obtained, as signalled by the bus arbi- 
ter. Semaphore integrity is preserved if all CPUs follow 
the software convention of locking their RMW (Read- 
Modify-Write) semaphore accesses. Also by conven- 
tion, the address corresponding to the first access with 
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LOCK # asserted is the only locked location permitted 
to that processor, until LOCK # deasserts (refer to the 
i860 Microprocessor Family Programmer’s Reference 
Manual Intel order #240875, Section 5-14). ° 


Would software want to be able to cache lockable loca- 
tions? Since they are used for interprocessor or inter- 
process communication, it might seem dangerous to 
keep them “‘hidden” in a cache. However, caching al- 
lows a CPU to read a semaphore repeatedly without 
generating bus traffic, waiting until the semaphore is 
free as indicated by a zero value. These reads can be 
done in non-locked fashion. If a copy of the semaphore 
is cached, no bus traffic is used for the reads, and the 
semaphore value still gets updated via the normal 
MESI consistency hardware when the semaphore’s 
owner writes it with a new value. 


KLOCK# ieenudan for back-to-back Intel486 DX ; 


CPU locked accesses is required of the MBC if it uses oss, 


address-based locking, so that the lock-manager knows 
the correct address. The i860 XP CPU always deacti- 
vates LOCK # for at least one clock between separate 
locked regions, by virtue of its deactivation in the clock 
after the last locked ADS#. However, the Intel486 DX 
CPU deactivates LOCK # only in the clock after the 


last BRDY# of the last locked access. Thus LOCK # 


and KLOCK# may not deactivate when two XCHG 
instructions occur in succession. The MBC can insert a 
deactivation of the M-bus MLOCK # signal by know- 
ing all Intel486 DX CPU locked accesses are Read- 


Modify-Write sequences. The MBC should deassert 


MLOCK # reeds of tus s value, after the 
write. — 


Deassertion of KLOCK # by the MBC hardware ares 


‘be required in any Intel486 DX CPU system, to avoid 


bus timeout and starvation of other bus masters when a 
continuous stream of locked accesses occurs in one 
processor’s program. Without it, one processor could 
monopolize the bus and prevent re-arbitration. 


CPLOCK# — 


CPLOCK # has a purpose similar to KLOCK# in 
Intel486 DX CPU systems, but is unused in i860 XP 
CPU systems. PLOCK # (Psuedo-LOCK) indicates an 
atomic 8-byte 2-transfer write for floating-point data 
which should not be interrupted. The’ 4-byte bus of the 
Intel486 DX CPU requires 2 transfers for an 8-byte 
datum, and if only half the transfer. gets done before 
another bus master reads memory, half-wrong data 
could be read. | 
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Thus the MBC should not relinquish the bus nor ‘re- 
quire snoops of its 82495 from the time of the BGT # 
for the first write (when CPLOCK# was asserted by 
82495) through the BGT # of the second write. This 
increases the worst-case delay of writeback for a 82495- 
snoop-hit to a modified line; to avoid the delay, the 
MBC can tie the CPLOCK # [PLOCKEN] pin low to 
disable FLOCK functionality. 


9.0 MORE ALTERNATIVES 


In addition to the options discussed above, several oth- 
er choices affect Memory Bus Controller design. 


M-bus: clocking should be chosen to allow future ver- 
sions of 82495 and 82490 at higher clock speeds. Up- 
grading the CPU module performance by replacing the 
processor and 82495/82490 will be possible. While 
some redesign of the CPU-side MBC state machines 
may be needed for faster clocks, the memory bus. can 
remain the same. Thus an asynchronous interface with 
either a strobed unclocked M-bus or a clocked M-bus at 
less than 50. MHz is advised. A fully synchronous 

M-bus/CPU MBC would be difficult to move to higher 
clock speed. 


One convenient way to design the MBC is with the 
M-bus MCLK = 0.5*CLK. Probably it will be possi- 
ble to keep the M-bus at half the CPU CLK rate, even 
with faster CPUs. The big advantage of this half-speed 
link is that no synchronizers are needed within the 
MEC if the MCLK and CLK edges are skew-con- 


trolled. The MBC can be totally on CLK, as in the | 


design i i of ep ene B. 


AP-452 


PRELIMINARY 


The choice between a Strobed or Clocked M-bus is of- 
ten determined by existing bus protocols in which 
82495/82490 will be used. Most existing buses are 
clocked; however, Futurebus+ requires all bus entities 
to use strobed tranfers, but allows an optional clocked 
mode for high-speed packet transfers [Fbus90]. The 
tradeoffs are shown in Table 2. 


Line size and M-bus width also determine upgradabil- 
ity to possible future versions of 82490 on the same 
M-bus, with more than 32kB per chip. If a higher-den- 
sity 82490 becomes ayallenle: the fact that 82495 has 8k 
tags requires: 


128 data bytes per tag (128 pyre line, or sectored 
64-byte lines) 

AND | 
8-byte or 16-byte memory tiie width 


to allow al MByte or 2 MByte 82490 configuration. If 
a smaller bus is used, a larger 82490 is possible, but 
the bus-size. multiplexing described earlier would be 
needed. | ie Bat 


Writeback (WB) cache policy is advised for high-per- 
formance (multi)processors to limit bus traffic. Howev- 
er, a writethru (WT) design is simpler for the MBC 
because there never is a need to backoff the 82495 due 
to MHITM#. In fact, the snoop window. in a WT sys- 
tem becomes unnecessary and SWEND*#¥ can be acti- 
vated simultaneous with KWEND #. In such a system, 
the only states of cache lines are S or J. Snooping has 
no effect during reads and only causes invalidations (in 
the slaves) for writes in a WT design. Cache-to-cache 
transfers and RFO are uel , 


Table 2. Clocked vs. Strobed MBUS Tradeoffs 


‘CLOCKED MBUS Advantages _ 
Design techniques for clocked systems are well 
known. 


Fast arbitration using MCLK state machines. 


Burst transfers proceed at one datum per MCLK ~ 


CLOCKED MBUS Disadvantages 


Must round-up delays to MCLK period quanta EG., 
_ 33 ns delay means two 30 ns MCLKs needed. 


Some 82495-to-82495 signals must be twice 
synchronized: once at sender, once at receiver. 


Backplane length limited. 


MCLK skew must be controlled. 


Requires assumptions on CLK vs.MCLK speed 
ratio: for example, CLK > MCLK > CLK/2. 


STROBED MBUS Disadvantages 
MBC design may require delay lines and non- 


conventional design techniques. 


Arbitration slow because signal must be 
synchronized at arbiter and at modules. 


Burst throughput slowed if each transfer requires 
acknowledgement from receiver. | 


STROBED MBUS Advantages 


Delays determined by device speed and physics, 
not by MCLK quanta. 


Each signal goes through sychronizer once, only at 
receiver, so less time is lost at synchronizers. 


Fewer limits on backplane length or capacitance 
or number of boards. 


No clock skew worries. 
Any CLK frequency will work. 
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10.0 MiBC DIFFERENCES FOR i860 
MP CPU VERSUS Intel486 DX 
CPU 


The same MBC design can be used for either 1860 XP 
CPU or Intel486 DX CPU if the MBC supersets the 
requirements of the two. A “CPU__TYPE” configura- 
tion pin can be included in the MBC to modify its be- 
havior. First, make the features as common as possible: 
© Choose a configuration acceptable for both CPUs: 


a) 256 kBytes, 4 transfers/line, 64- Du M-bus, 32-byte 
line. 


b) 512 kBytes, 
64-byte line. 

c) 256 kBytes, 8 transfers/line, 64-bit M-bus, 64-byte 
line. 


d) 512 kBytes, 

128-byte line. 

© 1860 XP CPU-pfld data is cached in 82490—no opti- 

mizations are included for pfld. 

© Assume that LOCK # duration does not matter (IE, 

that back-to-back LOCK #ed requests from 

Intel486 DX CPUs and long LOCK # cycles in i860 
XP CPU do not cause bus ownership timeout). 


4 transfers/line, 128-bit M-bus, 


8 transfers/line, 128-bit M-bus, 


Features Strictly for the Intel486 DX CPU : 


° BE7-4# for M-bus must be synthesized by the 
MBC from A2 and BE3-0# 
° CPLOCK # protection. 


© WRMRST (warm reset) can be included for both 
CPUs, but is optional. 


Features Strictly for the i860 XP CPU: 
© Burst writes from the CPU (Length=2 and 
Length = 4). 

© A second 74F377 BE#-latch is needed, for i860 XP 
CPU pins BE7#-—BE47, LEN, and CACHE#. 
PCYC and CTYP can also be latched for debug pur- 
poses. 

° PCHK# output from i860 XP CPU must be ig- 
nored except during the CLK after BRDY # comes 
from the MBC. PCHK # from Intel486 DX CPU is 
always valid. 


Differences between the MBCs: 
© Configuration pin strapping of 82495 inputs. 
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° Decoding CPU _ request burst. length from 
CLEN1:0(82495 pins in Intel486 DX CPU systems) 
or LEN and CACHE# (i860 XP CPU). 


° CPU Line length—16 bytes vs. 32 bytes (i860 XP 
CPU) means that the Intel486 DX CPU MBC will 
give 2 BRDY#s for every 1 BRDY# of the i860 
XP CPU MBC. 


Differences between Intel486 DX CPU and i860 XP 
CPUs which have no impact on MBC: 


© Intel486 DX CPU FLUSH # input pin. 


© 1860 XP CPU writeback caching, HITM #, and 
BOFF #. 


© 1860 XP CPU CS8 vs. Intel486 DX CPU. BSB #, 
BS16# (none are really useable). ' 


° Intel486 DX CPU RDY# pin and aiceieais 
bursts (not useable with 82495). 


° i860 XP CPU acknowledges HOLD during [AE 


LOCK #. 


° EADS# duty cycle (50% maximum for i860 XP 


CPU and 100% for Intel486 DX CPU, but handled 
by 82495). 


o KEN # pin sampling interval by the CPU. 
© Behavior of CPU in response to BOFF# assertion. 


° i860 XP CPU BERR (Bus ERRor) pin versus 
Intel486 DX CPU NMI (Non Maskable Interrupt). 


11.0 SUMMARY 


The interface between a CPU/82495/82490 chip set 
and a system memory bus allows much flexibility and a 
wide range of performance options. The simplest MBC 
can be a few PALs, while a top-performance multipro- 
cessing version may take thousands of gates on an 
ASIC. Signal pin counts for the MBC can range from 
70 to 120, varying with the memory bus definition im- 
plemented by the MBC. 


While beyond the scope of this document, topics for 
consideration include detailed timing diagrams, critical 
path analysis, simulation of bus traffic, and hit rates. 
Useful also are simulations of performance impact of 
the number of CPUs, WB versus WT policy, memory 
latency, CTCT, RFO, and duplicate tags. Also at issue 
are interrupt controller hardware, PAX concurrency 
control, boundary scan and selftest, PC-compatibility- 
implications, i860 XP CPU pfld options, and high- 


? speed design issues of impedance, termination, and 


noise. 
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APPENDIX A: 
Questions and Answers 


Why activate BGT # early, since 82495 won’t 
snoop between BGT # and SWEND #? 


CNA# for MBC pipelining ignored until 
BGT#. Also BGT# must precede CRDY # by 
at least 3 CLKs. And BGT# must precede 
BRDY #. 


How does PAX multiprocessing work with 
82495 and an MBC? 


A CCU chip must be included on the M-bus side 
of 82495 and 82490 for each i860 XP CPU in a 
PAX multiprocessor. Refer to [MPIC90]. 


Can the i860 XR CPU use a 82495/82490 
_ cache? | 


No, the bus sietetol of 82495 and 82490 


matches Intel486 DX CPU and i860 XP CPUs, 
but not i860 XR CPU. | 


Can 2 CPUs plug into one 82495, getting effi- 
ciency from shared cache? 


No, the protocol and physical capacitance of the 
interface do not allow it. 


- Should the same MBC be used for Uni & Multi? 


(i.e., how much extra logic is added to make a 
multiprocessor MBC?) 


It is possible, and the extra logic is eensonable 


_ for a Uni which could be upgraded to multi by 


ANS: 


ANS: 


Phone = 


adding another CPU + cache module. 
-Are software models of 82495/82490 available 


for simulation of MBCs? What simulators are 


- supported? 


As of September 1990, beta versions of models 
will be available Q4 1990 from Silicon West, Inc. 
(213)597-5995, FAX = (213)494- 
4588. Contact Silicon West for information on 
simulators supported (currently Workview, Ver- 
ilog, Zycad VHDL, Mentor Graphics). 


What is the fastest possible transfer of data from 
Mdata to Cdata? (i.e., how many CPU clks are 
spent?) 


The initial timings are listed in 182495/490DS]. 


'They are about 1.5 CLK periods including set- 


up-time at the CPU data pins. The connection 
from CDATA to MDATA is essentially a flow- 
through path. 


ANS: 


ANS: 


ANS: 
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on MBC Design 


Can the CPU-bus and eae a be on the 
same 50 MHz clock? 


Yes, but multiprocessor memory buses probably 
have too much capacitance and trace length to 
tolerate a 50 MHz clock. 


What are pin-counts for an 1 MBC (i.e., will it fit 
in my ASIC)? 


: 70 to 120 signal pins, depending on the bus pro- is 


tocol and MBC features. 


How long is a reasonable cacheability window, 
in MCLKs? 


: KWEND # is activated when MKEN#¥ and 
‘“MRO# are stable. MKEN# and MRO# can 


come from address decoders in the MBC or on. 
the MBUS. Thus KWEND*# could be 2 CLKs 
if the MBC itself determines 
cacheability, or as much as 5 MCLKss if the M- 
bus must see the request and determine 
MKEN #. - | 


How long is a reasonable snooping window, in 
CLKs? 


MWB/WT# and DRCTM# are generated 
from the snoopers’ MTHIT# and MHITM# 


_ signals. Thus SWEND # is activated when those 


signals (MWB/WT#, DRCTM#) are stable. 
That would be at least 7 CLKs, not counting the 
possible delay between CADS# and its M-bus 
counterpart MADS#. (see the discussion of 
snoop window above). | 


Is the SWEND# window length deterministic, 
or must SNPBSY # determine it? 


It is deterministic, but may be long when the 
82495 is busy. Yes, the SNPCYC# signal is re- 
quired to determine SWEND#. If SNPCYC# 
is not used, then the worst-case 82495 delay 
must be imbedded into the MBC logic, making 
the window longer than necessary most of the 

time. : 


intel. 


Q: 


ANS: 


How long can 82495 be “busy”, activating 
SNPBSY # and ignoring subsequent SNPSTB# 
activations? 


82495 busy-ness is not due to CPU requests, be- | 


cause 82495 gives higher priority to the snoops. 
But for snoops to M-state 82495 lines, 82495 


must do inquiries to the i860 XP CPU and get 


the more-recently modified data from i860 XP 
CPU before 82495 can writeback. A 82495 con- 
nected to an Intel486 DX CPU does not need to 
get modified data, as the Intel486 DX CPU has 


only S-state lines in the CPU cache. However, if 


'. SNPINV was active, 82495 must back-invali- 


date either CPU for S, E, or M state lines. The 


82495 must do multiple inquires or invalidates 
when the line ratio is 2 or 4. 


- What is the synchronization penalty in snooping 


(ie, how long from M-bus request to MHITM # 


| validity)? 


ANS: 


~ What 
— (32,64,128)? 


: This is TBD from simulations or measurements. 
It depends on the behavior of SW ~eppneahons 


: About 3 CLKs. See the discussion of . “snoop 


window” above. 


is optimal 82495 seneihe length 


the HW is intended for. 


Can Futurebus+ be used as the M- bus for a 


82495/82490 system? 


Yes. The Futurebus + spec is ompatiil with 
the 82495/ 82490. It supports MESI, strobed 
data transfer, address pipelining, cache to cache 
transfers, Read For Ownership, and many other 
features. 82490 would be used in strobed mode 
for Futurebus+. 


Can 82495 doa split-transaction bus (if not, why 
_ not?)? 


ANS: 


Maybe. 82495 implements. a restricted-backoff 


protocol to eliminate potential deadlock condi- 


tions in a shared bus multiprocessor environ- 
ment. Because of that protocol, and the fact that 
82495 will not snoop between BGT# and 
SWEND#, it is difficult to implement split 


transactions. It may be possible, using an addi- - 


tional set of tags which replicate 82495’s and 
allow snoops to continue between BGT # and 


“SWEND#. 


Can another 82495 be used for the “duplicate 
tags” for split transaction snooping? 


: No, the 82495 signal definitions and protocols 


make that very difficult. 
Why do the KWEND# and SWEND # signals 


exist? 
: SWEND #, by gating 82490-to-CPU-data-trans- 


fer, allows the M-bus data transfer simultaneous 
with snooping. In the usual case, no modified 
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copy will be found by the snoopers, so that 
transfer was not wasted. The alternative (that 
data cannot be transfered from memory until 


snoops complete) costs performance or requires 
a central tag directory. SWEND# triggers the 


82495 to update its tags. 


KWEND #¥ allows a variety of cacheability de- 
termination schemes—a long delay to determine 
MKEN# and MRO# might be needed if a pro- 
grammable RAM or EEPROM decodes cachea- 
bility based on address. If not, KWEND# can 
be activated quickly if there is a local MBC de- 
code of A31:A28 to determine MKEN #, for ex- 
ample. 


Why not just one WEND cal) 


Performance. KWEND# can be determined 
quicker than line-status in most implementa- 
tions. The early knowledge of cacheability to the 
82495 allows it to begin line replacements and 
allocations, and activate the next CADS# to 
MBC. 


How to connect 8-bit (or 16- bit) devices such as 
ROM and serial ports to 82490? 


If the devices are made non-cacheable, they can 
be tied to the MDATA pins of the least-signifi- 
However, if fetches from them 
must be cacheable, then byte assembly logic 
(latching transceivers) must exist to allow 82490 
to transfer from them 4 or 8 bytes at a time 
(1 M-bus width per transfer). 82495 and 82490 
require all cacheable locations to do burst trans- 
fers an M-bus-width of data per transfer. 


Does the 82495 have a CS8 mode? Does 82495 
support i860 XP CPU in CS8 mode? 


To support i860 XP CPU CS8 mode with 82495, 
the 8-bit ROM must be marked non-cacheable. 
This means that code being fetched in CS8 mode 
won’t be cacheable in the.82495 or the i860 XP 
CPU. For an 8-byte M-bus, the ROM data pins 
must be wired to the M-bus (MDATA of 82490) 
bits 7:0. For a 16-byte M-bus, the ROM must 
attach to M-bus bits 7:0 AND bits 71:64, which 
would require an 8- bit transceiver at the ROM. 


Should the DRAM controller be part of the 


MBC? 


For a simple uniprocessor, sahaes Multipro- 
cessors would have a DRAM controller for 
(each bank of) main memory, separate from the 
MBCs. 


How can the ee implement retry upon an 
M-bus parity error? | 


The MBC must re-issue the initial request, and 
reset the 82490 transfer logic using the MSEL # 
signal. 


ANS: 


ANS: 


ANS: 


Can 82490 use an ECC corrected-bus? 


: ECC (Error Correcting Code) can be used on 


the main memory bus, but the ECC check bits 


must be converted to parity or discarded before ° 
feeding the 82490. ECC would have to be gener- | 


ated at the 82490 MDATA pins for writes to 
memory. 


Can the MBC implement cache-to-cache trans- 


- fer on a write? 
ANS: 


No, the 82490 cannot “snarf” write data. That 
is, it does not merge a write (partial line) from 
the M-bus with existing cached lines. It can do 
Read-For-Ownership, merging write-miss data 
with an incoming line writeback from another 
cache. 


Can semaphores be cached in 82495/82490? 


Yes, but all read/writes which are locked are 
forced onto M-bus. So the semaphore would be 
read repeatedly without locking, until it is 
“free”. Then SW would re-read it in locked fash- 
ion to obtain ownership. | 


Is there any advantage to making semaphores 
cacheable, if all locked accesses go to M-bus? 


Yes, SW can repeatedly read the semaphore 
without LOCKing it, and no bus traffic thus is 
generated, waiting for the release of the sema- 
phore by any other master. 


Can a single multiplexed address + data bus (like 
Multibus-IT) be used for M-bus? 


: Yes, but transceivers external to the 82495 and 


82490 are required. 


How does the MBC implement a ““BACKOFF” 
when another 82495 activates MHITM#? © 


: If the data requested from a master 82495 is 


Modified in a snooper 82495, the master BC 
must postpone CRDY# until the modified line 
is deposited in the master 82490, after the 
snooper flushes the modified line to M-bus. 
Can MBC duplicate the CPU cache tags, to 
avoid unnecessary inquire cycles? 


Yes, but the performance benefit may not war- 
rant the extra hardware. 
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Can 1860 XP CPU Late-Backoff mode be used 
with 82495? 


: No. 


- What are the advantages and disadvantages of 


doing an asynchronous system (where MCLK is 
not the same as CLK)? 


Designers can easily upgrade the CPU side to 
higher frequencies (above 50 MHz) by faster 
PLDs in the CPU side of the MBC. The M-bus 
interface and all modules on the M-bus will not 
need to be changed. It easier to design a board 
when most parts run at a lower frequency. 


If the 82490 is reading information from. the 
memory bus and the MBC is _ generating 
BRDY #’s (RDYSRC= 1), can the MBC abort 


the cycle by giving a premature CRDY#, and [& 


restart it? 


The MBC can abort a memory bus cycle but 
cannot abort a CPUbus cycle. Once the first 
BRDY # is generated the cycle must complete. 
On the memory bus, a cycle is not aborted by 
giving an early CRDY #. In fact the 82495 does 
not understand that a cycle has been aborted. 
Only the MBC and 82490 are involved. The 
82490 allows its buffer to be reset using the 
MSEL # signal. | 


What is the purpose of 82490 having a separate 
MOCLK for output data, in addition to the 
MCLK for input signals? 


MOCLK allows greater hold time for writes 
from 82495, if it is skewed slightly from the 
MCLK which M-bus receivers use. MOCLK 
and MCLK must be exactly the same frequency. 
If the skew is not needed, MOCLK can be tied 
low. 


How many levels of pipelining can the 82495 use 
on the external memory bus? 


Each 82495 can use one level of pipeline on the 
memory bus, so the bus pipe depth can be great- 
er in a multiprocessor. A uniprocessor allows 
just one level of M-bus pipeline. 
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APPENDIX B: 
‘Intel486 DX CPU Uniprocessor MBC Design 


Please refer to Application Note AP-458, Designing a 
Memory Bus Controller for a 50 MHz Intel486 DX Mi- 
croprocessor Based System. (Intel order #241166). 
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APPENDIX C: 
is60T™ XP CPU DUAL-PROCESSOR NiIBC 


OVERVIEW 


This section presents a design for a memory bus con- 
troller for a system containing two i860 XP processors, 
each with an 82495XP/82490XP secondary cache. This 
MBC, together with an i860 XP CPU, 82495XP, and 
82490XP, comprises a core which interacts with a 
memory bus utilizing a bus protocol similar to that of 
the i860 XP CPU. 


The design presented here features an i860 XP CPU 
and 256 KB of 82495/82490 cache running at 50 MHz 
in each core. The clocked 64 bit (+8 parity) memory 
bus is asynchronous to the CPU and cache clock, al- 
lowing memory to run at lower speeds for more eco- 
nomical and convenient memory design. The MBC fea- 
tures snooping and pipelining to the memory, as well as 
advanced 82495 processes like write allocation, read for 
ownership and cache-to-cache transfers. | 


i860T™ XP CPU 


TcK 
To 


é 6 
(BERRA) PCHK# (PEN#) (33) BE7-O# CACHE# LEN BRDY2# 


| | 


MEM BUS 


MERR 


RESET 


ASSUMPTIONS 


The implementation presented here is a two processor 
design which can be extended to more than two CPUs. 
The definitions and examples given in this appendix are 
specific to the two processor version. The section 
Extension to 3 or More Processors gives specifics for 

larger systems based on this design. % 


The memory bus is 64 bits data plus 8 bits parity. 


The MBC design allows the processor to run at a high- 
er clock frequency than the memory bus. The frequen- 
cies are constrained such that the ratio of the frequency 
of the processor CLK and the frequency of the memory 
bus MCLK is between 1 and 2: 


= < MCLK < CLK 


reset §2490XP 
BLEs 

CADS4 

coTSsa 

SNPADS# 

CwiRa 

co/C# 

CMm/108 

MCACHE# 


ROYSRC 


un : MBC PINOUT PALics 


MBAEQ 
MHOLD 
MHLDA 


MADS8# 
MBE7:O0@ 
MMO, MDIC# 
MW/R4#, MLEN 


1 CLK 


NENES 
(SMLN&§) 
(FPFLD# [FPFLDENS)) 


KLOCKa# 
CAHOLD 
FSIOUT# 


PIN COUNTS: . ‘| MAOES 


MBAOE# 
MALE 


MCACHE# 


MBRDY8 
MNAS 


MWBEIWTS 
MKEN# 
UROS 
MBOFFS 
MABORT# 
MSWNDO# 
MSWNDI# 
MTHIT# 
MHITM# 
MSNPSTB# 
MSNPINV 
(MSNPNCA) 
MCLK 
MRESET 
MFLUSHS 
MSYNCS 


({MA31:3, MD63:0, MDP7:0) «———> 


MBROY# CROY# BROY# MSEL# MZBT# MFAZ# MDOESs 


[MSTBMS] 
MCLK [MISTB] 


(MOCLK [MOSTB}) 


MEOCS 


19 i860 XP CPU 
42 82495xXP 
11 82490XP 

+ 38 M-Bus 


111 TOTAL 


82495XP 


Figure C-1. Pinout Environment of MBC 
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This constraint ensures proper synchronization of sig- 
nals which cross between the MCLK portion of the 
MBC and the CLK portion. The prototype was de- 


signed and simulated with a CPU speed of 50 MHz and . © 


a memory bus speed of 33 MHz. 


Snooping mode can be independently set to strobed, or 
clocked in each core. , 


The main memory is responsible for. returning the 
MKEN # attribute to the memory bus controller in ane 
mers following MADS#¥ assertion. 


To save synchronization clocks, the MBRDY # signal 
of the protocol is defined to be asserted one MCLK 
before data is actually available. 


~ The 82495 operates with 32 bytes/line, 1 line/sector, 
and requires 4 memory bus transfers per line fill. 


OPTIONS 


With modifications the 82495 can operate in a mode 
with 64 bytes/line, 1 line/sector, requiring 8 memory 
bus transfers per line fill. 


The design here utilizes the 82490’s clocked memory 
bus mode. The strobed mode can also be utilized by 
making modification to the design. 


Support for various 82495. PFLD modes can be added 
to the design... 


Operation with either write-through or write-once pro- 
tocol can be performed. 


MEMORY BUS PROTOCOL 


l-bus Signals 


The system M-bus resembles the i860 XP CPU bus. It 
allows CPU modules with or without external cache on 
the same M-bus, so that balance between high perform- 


ance and low cost can be achieved. The signal specifica- 


tions below indicate Input (I), Output (O), or bidirec- 
tional (I/O) from the MBC’s point of view. Output 
signals to the memory bus such as MADS#, MLEN, 
and MA31:MA3 are floated by all MBCs except the 
one currently owning the M-bus. 


Signals whose names begin with Y (as in YBGT #) are 
in the MCLK side of the MBC, while an X prefixed 
name is in the CPU CLK side of the MBC. The X and 
Y signals are internal to the MBC. 
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MIRESET (I) - Memory bus RESET 


This signal forces the CPU to begin execution ina 
known state. It resets all MBC machines which are 
driven by MCLK. It is also synchronized (via a 2-stage 
synchronizer) to CLK and fed to the RESET inputs of 
the CPU, 82495, 82490s and all MBC machines which 
are driven by CLK. 


. MADS# (I/O) - Memory bus ADdress Strobe 


This signal ‘indicates that a new valid bus cycle is cur- 
rently being driven. The cycle address (A31:A3) and 
cycle specifications are valid in the MCLK that 
MADS# is asserted. A pipelined MADS# will be is- 
sued only after the MBC knows that the current cycle 
is guaranteed not to be aborted. For most memory ac- 
cesses, the master will assert MSNPSTB# to snoop 
other caches on the bus. When MSNPSTB #_ has been 
asserted, MNA# will cause a new MADS# to be is- 
sued after MSWENDI# signifies snooping has com- 
pleted. Furthermore, if MHITMI# was asserted with 
MSWENDI#¥ in this case, the new MADS#¥ cannot be 
issued until after the current cycle (now a snoop write- 
back) has been completed. When MHITMI# is not 
asserted with MSWENDI#, MADS# can be asserted 
immediately following MSWENDI#. If MSNPSTB# 
was not asserted for the current cycle, then MADS# 


_ could be issued immediately after MNA#, without 


waiting for MSWENDI#. 


For read cycles MADS# is issued after CADS#, re- 
gardless of CDTS# state. Requesting the memory bus, 
via MBREQ, is also done immediately after CADS#. 
This is due to the fact that CDTS # in a read cycle does 
not affect the memory bus, but indicates when the first 
BRDY # can be issued to the CPU. 


For memory writes MADS# ‘is issued only after 
CDTS#. Requesting the memory bus, via MBREQ, is 
also done after. CDTS#. This guarantees that for write 
cycles the memory bus data is valid 1 MCLK after 
MADS # (similar to the CPU). 


MNA# (I) - Memory bus Next Address 
Acknowledgement 


This is the memory bus next address signal, driven by 


the memory controller. It indicates to the MBC that 
the memory bus is ready to accept a new bus cycle, 
although the previous one has not been completed yet. 
If the MBC has a new cycle pending and the current 
cycle is guaranteed not to be aborted (see MADS# 


above), then a new MADS# will be issued. Note that 


the maximum level of pipelining on the memory bus is 
1. ; 
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MBRDY# (I/O) - Memory bus Burst ReaDY # 


This is the burst ready signal. For read cycles, 
MBRDY # indicates that in the following MCLK the 
memory bus will present valid data on the 82490 
MDATA pins. For writes, MBRDY # indicates that in 
the following MCLK the memory bus will accept the 
data from the 82490 MDATA pins. Note that this sig- 
nal is active 1 MCLK before the data is available on the 
memory data bus. This reduces the synchronization 
penalty between the M-bus and CPUbus by 1 MCLK 
period. 


For a clocked-asynchronous MBC, MBRDY# is de- 


layed by the MBC 1 MCLK and passed to the 82490 
MBRDY # pin. For a strobed-asynchronous MBC, the 
82490 MISTB and MOSTB will change value in re- 
sponse to MBRDY #. 


For Cache to Cache Transfers, the MBC with the Mod- 
ified: line drives MBRDY# active once per MCLK 
without wait states for the duration of the line burst. 


NISNPSTB# (1/0) - Memory bus SNPSTB# 


This is the memory bus snoop strobe signal. It is assert- 
ed 1 MCLK after MADS# by the MBC which asserted 
MADS*#, for all cycles that could be M-state in the 
other MBC. In _ writebacks and I/O cycles, 
MSNPSTB # is not asserted. The MSNPSTB# output 
of each MBC is connected to the 82495 SNPSTB# in- 
put of the other MBC, in this two processor design. 


MSWENDO # (0) - Memory bus SWEND# 
Output 


This is the memory bus snoop window end indication 
which is driven by the snooping MBC. It is connected 
to the master MBC’s SWENDI# input, indicating that 
snooping is finished and the snoop attributes are valid. 


MSWENDO# is an asynchronous signal which is | 


triggered by the 82495 SNPCYC# falling edge, and 
is negated after sampling an active SNPSTB#. 
MSWENDO # of one MBC is connected directly to the 
MSWENDI# input of the other MBC. 


MSWENDI+# (I) - Memory bus SWEND# Input 


MSWENDI# is connected directly to the other core’s 
MSWENDO# output. It is internally sent to two syn- 
chronizers: synchronized to CLK to generate 82495 
SWEND #, and synchronized to MCLK for MBC state 
machines which determine whether the current bus cy- 
cle should be aborted. 
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MSWENDI# indicates the end of the snoop window 
and that the snoop results MHITMO# and MTHIT# 
are valid. An active MHITMI# indicates a snoop hit 
to a modified line, and causes the master MBC to dis- 
card any data which has arrived from main memory, so 
that new data, which is being written out as the snoop- 
ing core performs a snoop write back, can be accepted. 
MTHIT# of each core is connected to the 
MWB/WT# input of the other core, to generate the 
WB/WT # signal to the 82495. 


MHITMO# (O) - Memory bus HITM# Output | 


This indicates a snoop hit to a modified line. In the two 
processor implementation of this MBC, it is connected 
directly to the other MBC’s MHITMI# input. 


MAITMI4 (1) - Memory bus HITM# Input 


MHITMI# is connected to the MHITMO# output of 
the other MBC, and determines if MBOFF# and 
MABORT# will be ‘asserted. It is sampled on 
MSWENDI*# activation. | 


MTHIT# (O) - Memory bus Snoop Hit Indication 


This snoop hit indication is based on the 82495 
MTHIT# output. The MTHIT# ouput of the snoop- 
ing core is used by the master core to determine the 
WB/WT# state for the accessed. line. The 82495 
MTHIT*# signal is passed directly onto the memory 
bus when the SNPINV signal is inactive for the snoop. 
On snoops with SNPINV active, the memory bus 
MTHIT # line is driven low, regardless of the value at 
the 82495 MTHIT # pin. 


The MTHIT# signals from the memory bus control- 
lers on the bus are wire-anded together. Because the 
82495 MTHIT# output only changes state with each 
new snoop, the master memory bus controller must 
float its MTHIT#. 


VIBOFF# (O) - Memory bus BOFF# 


This is the memory bus back-off signal which is driven 
by the master MBC. The master MBC floats its bus 
concurrent with MBOFF# activation. When the 
snooper MBC samples an active MBOFF # and it has a 
pending snoop write-back cycle, it issues the cycle to 
the memory bus. Note that the snooper issues the cycle 
even though it is still in a bus hold state (MHLDA 
asserted). If MHITMI# is sampled active during 
MSWENDI#¥ and the previous cycle has completed, 
then MBOFF# will be asserted immediately after 
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MSWENDI/#. If the previous cycle has not completed 
and the pipelined cycle hits a modified line, then 
MBOFF # will be asserted only after the previous cycle 
completes. The snooping MBC floats its bus only after 
the snoop write-back cycle has completed. Note that 
from the arbiter’s viewpoint the bus is still granted to 
the master MBC. 


MABORT # (0) - Memory bus Abort 


This is the memory bus abortion signal which is driven 


by the master MBC. When the main memory samples | 


an active MABORT # it aborts any cycle that is cur- 
rently being serviced: The memory aborts the cycle re- 
gardless of the number of MBRDYs that have been 
issued. Thus MBRDY# of the aborted cycle will not 
be issued after MABORT #. A new cycle could be serv- 
iced immediately after MABORT #. 


If MHITMI# is sampled active during MSWENDI# 
and the previous cycle has been completed, then MA- 
BORT # is asserted immediately after MSWENDI#. 
If the previous cycle has not been completed and the 
pipelined cycle hits a modified line, then MABORT# 
is asserted only after the current cycle has completed. 


MABORT ¥ can also be asserted during read for own- 
ership with a hidden write (allocation after a non-com- 
pleted write in the main memory). In this case if the 
master MBC samples an active MKEN# (1 MCLK 
after MADS#) during a potentially allocatable write 
cycle, it asserts MABORT# immediately, ie. 2 
MCLKs after MADS#. 


Note that MABORT # is always guaranteed to be a 1 
MCLK width pulse. 


MLOCK# (I/O) - Memory bus LOCK 


This signal does not exist in the current. implementa- 
tion. Instead, the MBC simply refuses to give up the 
M-bus to the arbiter when it is running locked accesses. 


MHOLD (I) - Memory bus Hold Request 


When this input to the MBC is asserted, the MBC as- 
serts MHLDA and floats all inputs and outputs except 
MBREQ, MHLDA, MSWENDO #, and MBOFF #. If 
the MBC has outstanding bus cycles in progress 
(MADS# has been asserted), they are completed be- 
fore the MBC relinquishes the bus. MHOLD is recog- 
nized during MRESET assertion. 
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MHLDA (O) - Memory bus Hold Acknowledge 


The memory bus hold acknowledge signal goes active 
when an MBC relinquishes the bus in response to an 
MHOLD request. The memory bus controller floats its 
bus in the same MCLK that it issues the MHLDA. 
When the MBC leaves bus hold, MHLDA is negated 
and the core resumes driving the bus. If a cycle is pend- 
ing when leaving bus hold, the MADS¥ will be issued 
in the same MCLK that MHLDA is negated. 


MINT (I) - Memory bus Interrupt 


This interrupt signal is connected directly to the 1860 


XP CPU in the core. 


MKEN # (I) - Memory bus KEN# 


This is the memory bus cache enable signal. It is used 
by the MBC to determine the length of the current bus 
cycle, and is also connected directly to the 82495 
MKEN # input. 


In potentially cacheable read cycles, it determines cycle 
length. In potentially allocatable write cycles, it deter- 
mines whether read for ownership with hidden write 
will be performed. | 


In the current implementation, MKEN # must be driv- 
en by the memory controller in the MCLK after 
MADS # was issued. 


MRO# (I) - Memory bus Read Only 


Assertion of this signal causes an access to be treated as 
read only by the core. This signal is connected directly 
to the 82495 MRO# input, as well as to the MBC. 


MWB/WT# (lI) - Memory bus WB/WT# 


This is the write-back/write-through input connected 
to the memory bus. It is connected through MBC logic 
to the 82495 MWB/WT # input. | 


MDRCTM (I) - Memory bus Direct-to-M 


This is the memory bus DRCTM # signal which forces 
a line entering the cache to be placed directly in the 
[M] (modified) state. In addition to this signal which is 
connected from the memory bus to the 82495, the MBC 
can internally drive the 82495’s DRCTM*# pin during 
read-for-ownership cycles. 
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MFLUSH#, MSYNC# (I) - Memory bus 
FLUSH#, SYNC# 


These signals cause the core to flush or sync its cache, 
by asserting FLUSH# or SYNC# to the 82495, re- 
spectively. The signals are driven by the main memory 
controller upon detecting a Core flush or sync com- 
mand, which consists of a special cycle with either 
MBE1# or MBE3#¥ active, respectively. 


MBREQ (QO) - Memory bus Request 


The MBREQ# signal is asserted by an MBC to indi- 
cate to the memory bus arbiter that the MBC needs the 
memory bus. An MBC will generate this signal regard- 
less of whether or not the MBC is currently driving the 
bus. 
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MBREQ # is not issued for snoop write-back cycles. If 
the snooping core already had its MBREQ# pin assert- 
ed, the pending cycle which caused the MBREQ# is 
aborted by the snoop write-back, according to 82495 
protocol. The MBC state machines of the snooper, 
however, continue to assert MBREQ# until an internal 
time-out period has elapsed, allowing the snooping 
82495 to reissue the aborted cycle after the snoop write- 
back has completed. Therefore a core which is waiting 
for the bus can service a snoop write-back without los- 
ing its request for the bus. 


MLEN (O) - Memory bus LEN 


This signal together with MCACHE#, MW/R#. and 
MKEN # determine the memory bus cycle perigtD ac- 
eorgine to the following table: 


MW/R# MLEN MCACHE # MKEN# T_tengin | _Notes _| 


NOTES: 


1. Locked i860 XP CPU write-back cycles (length= 4), caused by the i860 XP CPU executing a FLUSH instruction during a 
LOCKed sequence, are treated as normal write cycles (length= 1 or 2 according to LEN). This is allowed since i860 XP_CPU 
write-back cycles always access a 82495 modified line (in [M] state) and are only written into the 82490, without updating 


memory. 


2. MKEN# must be driven valid the clock following MADS # by the memory controller. 


" MMI/O#, MD/C# (0) - Memory bus /0# and 
D/C# 


These signals, together with MW/R#, define the oe 
ory bus cycle, according to the 1860 XP CPU Data 


Sheet. They are driven in the same MCLK as 


MADS#. 


MBE[7:0] # (O) - Memory bus BE[7:0] # 


The byte enable signals to the memory bus identify 
which bytes are being accessed. They are identical to 
the CPU byte enables on CPU generated cycles. For 
82495 generated cycles (write-backs and allocations) all 
MBE #s are asserted. 


MCACHE # (I/O) - Memory bus CACHE # 


In a master core MCACHE# is an output; in a snoop- 
ing core it is an input. As an output, it indicates poten- 
tially cacheable reads or a 82495 write-back. 
MCACHE#¥ is used by the system memory together 
with MLEN, MW/R# and MKEN ¥ to determine cy- 
cle length. As an input, MCACHE# is connected to 
the 82495 SNPNCA pin. 


MW/R# (I/O) - Memory bus W/R# 


This signal is an output for a master core, an input for a 
snooping core. As an output, it indicates whether the 


"memory access is a read ar a write, and is used by the 


system memory along with MMI/O# and MD/C# to 
determine the cycle type, according to the 1860 XP 
CPU Data Sheet. As an input, the signal is connected 
directly to the 82495 SNPINV pin. 


MA[31:3] (I/O) - Memory bus Address 


These are the memory bus address lines of the MBC. 
Along with the byte enable signals, they define the 


' physical area of memory or I/O accesses. In a master 


MBC they are driven by the 82495 onto the memory 
bus together with MADS# (same MCLK). In a snoop- 
ing MBC, these lines are inputs to the 82495 which are 
latched by the MSNPSTB# signal. | 
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MD[63:0], MDP[7:0] (1/0) - Memory bus Data ue YMEOC #/YSMEOC # - (O) MBC Memory End 
and Data Parity | Of Cycle | i 


64 bits ‘of data, 8 bits . of : batig are eiliebdak 
transceivers to the i860 XP CPU and 82490s. When an 
MBC does not own the bus, these pins are tristated. | 


 XAS#/XSAS# - X Unit Address Strobe | 


XAS# is generated in the X-unit (sync to CLK), and is 


synchronized and sent to the Y-unit as XSAS#. 


XAS# indicates the start of a memory bus cycle from 
the X-unit (CLK side). XAS# is generated as a result 
of a CADS# from the 82495 on ‘a read cycle or 
CDTS# from the 82495 on a write cycle. XAS# ‘is 
held active until the X-unit receives YSBGT#. _ 


YBGT #/YSBGT# - Memory t bus Guarenteed 
Transfer 


YBGT # is generated in the Y- unit, and is oe 
ed and sent as NSEGTY le the X-unit. 


This signal is senate in the Y-unit after MADS# 
(the cycle has been issued on the memory bus). When 


YSBGT # arrives at the X-unit, the signal causes asser- 


tion of the 82495’s BGT # input, and one clock later 


(non-pipelined cycle). the assertion..of KWEND#. .- 


YSBGT# of a pipelined cycle (which is sampled during 
the initial cycle, ie. before its CRDY#) causes the 


BGT# and KWEND# of the pipelined cycle to be 


issued immediately after CRDY # of the initial cycle. 


YBGT¥# of a pipelined cycle cannot be issued before 
the MSWEND # of the previous cycle. This is guaran- 


teed by the M-bus protocol, which ensures that a pipe- | 
lined MADS# is not issued until ‘after the 


MSWEND # of the previous cycle. 


BGT#, KWEND# (O) - Sila Guaranteed ee 
Transfer, Cache Window End to 82495 


BGT# and KWEND# are generated for every cycle 
(including snoop write-backs). Ina non-pipelined cycle 
BGT # is issued immediately after sampling YSBGT # 
active, and KWEND # is issued 1 clock later. In pipe- 
lined cycles, these signals ‘are asserted | after the 
CRDY# of the initial cycle. | . 


YMEOC # is generated in the Y unit, and is synchro- 
nous to MCLK, and sent to the X-unit as YSMEOC# . 
It indicates the M-bus transfer has finished, based on 
the MBC’s tranfer length count. YMEOC#¥ directly 
drives the 82490s’ MEOC # inputs. YSMEOC# causes 
generation of the CRDY*# signal to the 82495 and 
82490s. For non-pipelined cycles CRDY # is issued im- 
mediately after an active YSMEOC# (if CDTS# was 
issued). For pipelined cycles CRDY# is issued after 
the CRDY# of the previous cycle (if YMEOC#, 
CDTS# of the pipelined cycle were issued). 


YMEOC# * ‘is issued at least 2 MCLKs after YBGT # 
(for every cycle). 


- YCEOC#/YSCEOC# - MBC CPU End Of Cycle 


This signal is internal to the MBC: YCEOC is generat- 
ed synchronous to MCLK, and is synchronized to 
CLK to produce YSCEOC#. It indicates that the 
CPUbus transfer has finished, based on the MBC’s 
tranfer length count. It generates the BRDY #s to the 


- 82495, 82490, CPU, and to other MBC machines. For 


non-pipelined cycles all BRDY #s except the first are 
issued immediately after an active YCEOC# (if 
CDTS# was issued). For pipelined cycles all BRDY #s 


except the first are issued after the CRDY # or the last 


BRDY# (BRDY # * -CLEN1) of the previous cycle. 


“ YCEOC# can be issued before, with, or 1 clock after 


YMEOC#. When the line ratio is 2 or 4, YCEOC# — 
precedes. YMEOC# by a significant time, allowing 
CPU linefills to complete long before the M-bus tranfer 
completes. 


YCEOC# iS - asserted’ a if RDYSRC iS active 
(High). 
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Non-aborted Read Cycles 


Figure C-2 is a timing diagram for the memory bus 
controller executing a line fill after the i860 XP CPU 
issues a read which misses the 82495/82490. The dia- 
gram reveals a number of the signals which are internal 
to the MBC, to provide a better perspective on the tim- 
ing of events. Note that signals which begin with an M 
are MBC signals to the memory bus. Signals that begin 
with Y originate in the Y side of the MBC which is 
synchronous to MCLK, and an X denotes origin in the 
X state machines, which are synchronous to CLK. 


The 1860 XP CPU microprocessor issues a read cycle in 
CLK 0, as indicated by the assertion of ADS#. The 
82495 performs the tag lookup, and finds the request a 
cache miss. In CLK 2, the 82495 issues CADS# and 
the cycle control signals, alerting the memory bus con- 
troller that a 4 transfer 82495 read is requested. 


The X side state machines, which run on the processor | 


CLK, issue an XAS# on the CLK after CADS# for a 
82495 read cycle (CW/R# = 0). The XAS# signal 
passes through the synchronizer running on MCLK to 
become synchronized in two MCLKs. The synchroniz- 
ed XAS# signal, called XSAS*#, is sent to the Y side of 
the MBC in MCLK 4. 


In MCLK 5, XSAS# has initiated the assertion of 
MBREQ to request the memory bus from the memory 
bus arbiter. If the bus is already owned (or once it is 
owned) by this MBC, XSAS# causes the assertion of 
MADS# to the memory bus, MAOE#¥ to the 82495, 
and the internal YBGT# signal. The assertion. of the 
82495’s MAOE signal allows the 82495 to drive its ad- 
dress lines to the memory bus. YBGT# indicates that 
the memory bus is owned by this MBC, and is sent to 
the synchronizer for the X side of the MBC as well as 
many Y side state machines. 


On the Y side, YBGT # is used to deassert MBREQ#, . 


to sample YALLOC# on writes, and to initiate 
MSNPSTB#. MSNPSTB# is asserted in MCLK 6 to 
request a snoop in the other MBC. YBGT# is also 
synchronized to CLK, appearing as YSBGT#, by 
CLK 9. YSBGT # causes the assertion of BGT # to the 
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82495 in CLK 10, and, 1 CLK later, KWNED#. The 
MKEN# input, which must be valid to the 82495 
when KWEND # is asserted, must be driven by the 
main memory on the MCLK after MADS# for this 
implementation. These signal activities define the initia- 
tion of normal bus cycles (as opposed to snoop write- 
backs). . 


In this particular example, the memory bus responds 
quickly to the read request. Here, the memory subsys- 
tem drives MNA# to the MBC in MCLK 6, and pres- 
ents data on the memory bus in MCLK 7. Since 
MBRDY# must be driven by the memory bus 1 
MCLK before data is available, MBRDY # is asserted 
in MCLK 6, with successive MBRDY #s on the follow- 
ing MCLKs. The YMBRDY # output of the MBC is 
the MBRDY # signal delayed one clock, and drives the 
MBRDY # input on the 82490s to read in the incoming 
data. 


While the data transfer is occurring, the second memo- 
ry bus controller responds to the snoop request for this 
memory access in MCLK 8. Because the data is not 
present in the cache of the other core, that MBC will 
assert itt MSWENDO# output with MHITMO# 
driven high. These outputs of the snooping core are tied 
directly to the MSWENDI# and MHITMI#¥ inputs, 
respectively, of the master core in this two core imple- 
mentation. Both of these signals are passed to the 82495 


~(MSWENDI# is synchronized first) as well as to the 


state machines of both sides of the MBC. The arrival of 


these signals allow the core to accept the data as ‘valid, 


and conclude with the read operation when all of the 
data has been transferred. | 


The arrival of the fourth MBRDY# generates the 
YMEOC# and YCEOC# signals in MCLK 10. 
YMEOC# drives the MEOC # input on the 82490s. In 
addition, both signals are synchronized and sent to the 
X side of the MBC. Upon the arrival of YSCEOC#, 
the X state machines begin generating BRDY #s to the 
i860 XP CPU. Upon arrival of YYMEOC#, CRDY # 
is driven to the 82495, indicating the end of the cycle. 
YMEOC# and YCEOC# are used to reset many of 
the Y side state machines, including cycle type and 
length indicators, and the drivers of 82490 signals such 


as YMALE# and YMSEL#. On the X side, the reset 


functions are triggered by CRDY# and the last 
BRDY #. 
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Figure C-2. Non-Aborted Read evcice 
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Figure C-2. Non-Aborted Read Cycles (Continued) 
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Aborted Non-Pipelined Cycles 


Figure C-3 illustrates an aborted non-pipelined cycle. 
MHITMI# is sampled active during MSWENDI# 
(clock 4) indicating a snoop hit to a modified line. Since 
the cycle is non-pipelined, MABORT # is issued imme- 
diately and the core floats its bus (clock 5). Although 
the bus is floated by the master core, the master still 
owns the bus (MHLDA remains inactive). 


MABORT# in clock 5 causes the main memory to 


abort its cycle regardless the number of MBRDYs that 
have been issued. MBOFF # is also asserted in clock 5 
to indicate to the snooping core that the master is float- 
ing its signals and the write-back may begin. The main 


memory floats its data bus in clock 6 in response to 
MABORT*#. In the following clocks a snoop write-— 


- back cycle is performed by the snooper. The snooper 
will release the bus at the end of the write-back. _ 


Note that MSNPSTB# is not asserted during the 


write-back cycle since it obviously will not hit any 
cache. aes 


Aborted Pipelined Cycles 


Figure C-4 illustrates an aborted pipelined cycle. Al- 


though MHITMI# is sampled active during 


MSWENDI# (clock 7) MABORT# will not be issued | | 


immediately since the previous cycle has not been com- 
pleted yet. MABORT# is issued in clock 9 after 
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the last data slice was read into the core. The core floats: 
its bus and asserts MBOFF# concurrently with 


-MABORT #. Upon sampling MBOFF#, the snooping 


MBC begins the snoop write-back in clock 10. 


Write Allocate 


Figure C-5 illustrates a write cycle which is potentially 


allocatable. This write is performed on the bus only in. 


order to sample the MKEN#, since the allocation cy- 
cle will only be guaranteed if MKEN # is active. 


MKEN# is sampled active in clock 2 causing the 
MABORT # to be issued immediately. The reason to 
abort the write cycle, even before MSWEND #, is due 
to the fact that a read for ownership cycle is guaranteed 
to be performed after the aborted write. 


_In‘clock 4 the MADS# of the allocation cycle, which 


becomes the MADS# of the read for ownership cycle, 


is issued. This MADS # is issued only if MSWEND # 


has not been issued yet, or if MSWEND# was issued 
and MHITMI# was negated. If MHITMI # is asserted 
during the MSWEND # that was issued, MADS# will 
not be issued (since the snooper issues its MADS #). 


A second MABORT# is issued in clock 8 indicating 
the memory to abort the allocation, and the snooper to 


start flushing the modified line. Note that a second 
_ MABORT# will be issued regardless if MADS# of 
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Figure C-3. Aborted Non-Pipelined Cycle 
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Figure C-4. Aborted Pipelined Cycles 


the allocation was issued or not. The first MABORT # 
(clock 3) aborts the write cycle in the memory module 
and does not affect the snooper. The second 
MABORT # (clock 8) indicates to the snooper to start 


its write-back cycle (and if MADS# of an allocation 


was issued to also abort it in the memory module). 


MSNPSTB # is not issued for the allocation cycle since 
write and allocation cycles access the same line. 


If MKEN # had been negated in clock 2 then an alloca- 
tion would not have been performed and the write cycle 
would have continued as a non-allocatable write cycle 
(see figure C-6). 


Non-Allocatable Write 


Figure C-6 illustrates a write cycle without an alloca- 
tion. It can be either a non-potentially allocatable write 
cycle or a potentially allocatable write with inactive 
MKEN # (clock 1). 


The write cycle is aborted (MABORT# in clock 3) 
after sampling active MHITM# during MSWEND# 
(clock 2). In clock 11 the master core re-issues the 
MADS# of the aborted write cycle (after the snoop 
write-back has been completed). MSNPSTB# will not 
be issued again since the updated data had been written 
into the main memory and the snooper has gone to the 
invalid state. 


LIMITATIONS OF DESIGN 


The primary limitation of the implementation as it has 
been presented so far is that it includes only two proces- 
sors. The protocol set up in the design is not limited to 
two processors. The next section outlines the imple- 
mentation details which must be modified to extend the 
design to more than two processors. 


The design has no support for CS8 mode, so the proces- 
sors cannot be booted from 8 bit EPROMS. Instead, 
both processors boot in 64 bit mode, which may com- 
plicate the use of the design in stand-alone systems. 


The i860 XP CPU’s BERR, or Bus ERRor, input is not 
utilized in this design. The pin could be used simply as 
a non-maskable interrupt pin, but the memory bus con- 
troller as designed makes no provision to use BERR to 
correct a faulty bus access. Likewise, the parity check 
results from the i860 XP CPU’s PCHK# pin are of 
little value'in this design outside of testing the 1860 XP 
CPU’s parity functions. The MBC itself does not check 
the PCHK # output, and has no means of reissuing an 


"access in case of parity error. 


The memory bus controller design here does not decode 
and utilize the i860 XP CPU INTA cycles. The INT 
pin itself is connected directly to the 1860 XP CPU, 
without affecting MBC operation. | 
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Figure C-5. Potentially Allocatable Write 
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Figure C-6. Non-Allocatable Write _ 
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The MultiProcessor Interrupt Controller (MPIC) cur- 
rently being designed by Intel is not utilized in or sup- 
ported by this memory bus controller. 


The memory bus controller’s treatment of LOCKed cy- 
cles is simple but straightfoward: when the 82495 issues 


a memory access which is LOCKed (KLOCK# ac- - 


tive), the MBC will not relinquish the bus until a cycle 
which is not LOCKed is issued. While this is adequate 


for simple systems, it will not suffice for dual ported 


memories, where a given block of memory can be ac- 
cessed through more than one bus. In such systems, a 
LOCK signal must be introduced to alert all possible 
simultaneous users of memory that a LOCKed access is 
in progress. 


EXTENSION OF DESIGN TO THREE 
OR MORE CPUs 


Two Processor Implementation 
Overview 


Figure C-7 presents a simplified view of the multipro- 
cessing signals for the two processor implementation. 
The basic address, data, and memory cycle control lines 
are attached to a common bus. Only the core which 


controls the bus will drive these signals, with all other — 


cores floating these lines and asserting MHLDA#.. 


When the bus master MBC issues a cycle, the 
MCACHE# and MW/R# cycle attributes also serve 
to drive the 82495s’ SNPINV and SNPNCA inputs of 
both cores. SNPSTB# is issued by the master in the 
clock following MADS#. In reality, both cores have a 
SNPSTB# output at their Y-side state machines driv- 
ing a common line which connects to the SNPSTB# 
input of both 82495s. The core which does not own the 
bus floats its state machine driver on MHLDA, so the 
signal acts only as an input in that core. The master 
drives the SNPSTB# line, but the action of SNPSTB# 
is blocked in its own 82495 because its MAOE# signal 
is asserted. 


The results of the snoop are driven out on the snooping 
_core’s MTHIT# and MHITMO#¥ outputs, and 
MSWENDO # is asserted. These signals are connected 
directly to the MHITMI#, MWB/WT#, and 
MSWENDI/#¥ inputs in the master core, respectively. 


The MBOFF# signals of the two MBCs are also con- 
nected together. During MHLDA (in a snooping 
MBC) MBOFF # is an input, and in the master it is an 
output. If the master asserts MBOFF, control of the 
data and control busses is given to the snooping MBC 
so that a snoop write-back can be performed. 


Three or Nore Processors 


This section gives one method of extending the design 
given here to three or more processors. The solution 
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presented here assumes that no changes are made to the 
state machines as they are written for the two processor 
system. Instead, some minor glue logic is added to three 
of the signals to make the core an element in a scalable 


- multiprocessing system. However, modifying the state 


machines is also a plausible solution.’ 


In an implementation with three or more processors, 
the primary address, data, and cycle control lines are 


still connected to a common bus, as in the two proces- 


sor version. MCACHE# and MW/R# are also uti- 
lized in. the same way as the two processor version: the 
outputs of the cores drive a common line which in turn 
also drives the 82495 SNPNCA and SNPINV inputs of 
all cores. 


The SNPSTB# signal connects directly from core to 
core in a two processor version. In an implementation 
with three or more processors, the SNPSTB# line is 
simply extended to all the processors in the system. 
Only the bus master will actually drive the line, and 
snoopers will be floating the SNPSTB# output from 
their state machines. Again, the snoop request is ig- 
nored in the master because its MAOE#. is asserted. 
Similarly, the MBOFF¥# signal becomes a common line 


__ which only the master will drive and which all other 


cores will sample. 


The six signals in the upper portion of diagram C-7, 


which communicate MSWEND and the snoop results 
MHITMO# and MTHIT#, will require more glue 
logic to extend the design to three or more processors. 
The snoop results MHITMO# and MTHIT# must 
now be. considered for multiple cores when a snoop has 
been issued, and the master MBC must not sample 
these results until all snooping cores have issued their 
MSWENDO #. 


To resolve these issues, common bus lines carrying 
these signals are introduced, where all cores have out- 
puts driving these lines, and inputs to sample them. The 
characteristics of such MTHIT# and MHITM # lines 
are straightforward: the line should default to 1, and if 


any core drives one of these outputs low, the line 


should be pulled low. The MTHIT # line has the sim- 
plest solution. As shown in figure C-8, by passing the 
signal which is produced by the core through an open 
collector buffer, the buffered MTHIT #s can be tied to © 
a single line which is sampled directly by all cores’ 
MWB/WT# pins. The open collector buffer sinks cur- 


rent like a normal gate output to drive a logic 0, but 


instead of driving current for a logic 1, the open collec- . 
tor device assumes a high impedance state for logic 1. 
Thus, if all of the cores outputs MTHIT# as 1, the 
MTHIT # line remains at a logic 1 level because of the 
pull-up resistor. If one or more cores outputs a logic 0, 
the MTHIT# line will be pulled to the logic 0 level. 
This precisely matches the desired behavior of 
MTHIT# for the system: if any 1 or more core(s) has © 
the snooped data cached, the master MWB/WT # in- 
put must be asserted low. It is important to note that 
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Figure C-7. Interprocessor Communications in Two Processor System 
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CORE B 


©) MSWENDO# 
1) MSWENDI# 
O) MHITMO# 

1) MHITMI# 
O). MTHIT# 

l) MWB/WT# 


(VO) MBOFF# 


(I/O) MSNPSTB# 
(I) SNPINV 
(I) SNPNCA_ 


MCACHE# 


a 
zz 


M-DATA 
=| M-ADDRESS 


MW/R# 
| M-CONTROL 


TO. MEMORY 
240957 -29 


intel. 


‘the MTHIT# output of the master is floated: because 
the 82495 MTHIT # output only changes on each new 
snoop, the value of the master MTHIT # output for the 
previous snoop would erroneously be included in decid- 
ing the level of the MTHIT # line. 


The MHITM# line follows the same principle as the 


MTHIT # line. The MHITM# signal is not floated in 
the master core, and poses the problem which floating 
MTHIT# avoids: the value of the master’s last MHIT- 
MO# output is still present when the new access is 
being made. To resolve this, the inverted value of 
MHLDA is ORed with MHITMO# before going to 
the open collector buffer. The master’s MHLDA 1s al- 
ways a 0, so the OR gate will always guarantee a 1 
being passed from the master to the MHITM*# line. 
Again, if one or more of the snooping MBCs outputs a 
logic 0, the MHITM# line will properly assume a 0 
level. 


The open collector buffer presents an easy way to add 
new MBCs to the shared lines. The desired behavior of 
a shared MSWENDA (MSWEND Al) line is different 
from the attribute lines, MTHIT# and MHITM#. 
Where the master core should sample a 0 if any one or 
more snooping core(s) drives a 0 on these attribute 
lines, the master core must not receive its MSWEN- 
DI# indication until all cores in the system have as- 
serted their MSWENDO#¥ output. The answer is to 
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invert the MSWENDO output of each snooper, so that 
a. zero is driven onto the MSWENDA line when the 


snoop is being performed, and a one is output if the 


snoop has completed. From the MSWENDI # perspec- 


_ tive, MSWENDI# should not be asserted at the master 


core if any snooping core is still driving a zero on the 
MSWENDA line (is not done snooping). Therefore, the 
MSWENDA line is the opposite logic polarity of the 
actual MSWENDO# signal. The master samples 
MSWENDA after the signal passes through an invert- 
er, to recorrect the logic level. The output of each core 
is passed through inverter before going to the open col- 
lector buffer. The inverting device is a NAND gate be- 
cause the SWENDO# signal shares the problem of 
MHITM#, and must be “faked” by the master. In this 
case, instead of the last snoop’s results causing the 
problem, the master’s SWENDO# signal is reset to 1 
(still snooping) when the SNPSTB# line is asserted. 


Again, these simple adaptations can be implemented in oe 


a similar manner in the logic of the state machines. The 
MHITMO # line can be forced to a logic one or floated 
when the core is a master (after YBGT, for example). 
The MSWEND signal might be implemented as an as- 
serted-high system signal, if open collector buffers are 
used to attach new cores to the shared system bus. 
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Figure C-8. Extension Glue 
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STATE MACHINES AND SCHEMATICS 
STATE DIAGRAMS 


CADS . CWR# + CDTS# .CWR 


YSBGT# . SNPADS# 


240957-31 


RESET. CLDRV . TR4 


YSBGT . ENBGT 


RESET . CLDRV# . TR4 


CRDY . (YSBGT# + ENBGT#) 


CRDY# . (YSBGT# + ENBGT#) 
/CKENLC 
CRDY E CRDY# . YSBGT . ENBGT 


MKEN# 
TRANSPARENT » CKENY 
CKENLC# L LATCH 


XBGTKWND 


240957-32 
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\ 
\ 


WSDTS.(YSCEOC.ENBRDY.PNDCEOC) 


ACTBRDY , 


CLEN 1 # 


240957-33 
XBRDY 


CPUEN# 


RESET ae 
BRDY BRDY# 
CPUEN oe ALKCACHE# + CKEN# + cachen) | 


CPUEN . LEN . (LKCACHE# + CKEN# + CACHE#) BRDY 


CPUEN .LKCACHE .CKEN .CACHE | 


BRDY 


BRDY# 


BRDY 
es BRDY# - 


240957 -34 
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4 CRDY . WCPLB CNADIS# . YSBGT . WCPLB# 
CRDY . WCPLB# . PBGT# . YSBGT# CNADIS# . BGT . WCPLB 


CRDY# 
(PBGT + YSBGT) . WCPLB# 


XCNA 


+ 


240957-35 


RESET . SLFTST# 


YSMEOC# 
RESET . SLFTST } 


YSMEOC.WSDTS# 


YSMEOC# YSMEOC.WSDTS YSMEOC# 


YSMEOC 


CBSTC# 
240957-36 
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YSCEOC# 
ENBRDY 


PNDCEOC LBRDY . YSCEOC 
ENBRDY 


LBRDY = CLEN1.BRDY 


LBRDY . YSCEOC# 
LBRDY# . YSCEOC# 


LBRDY# . YSCEOC . YSCEOC 


LBRDY # 
240957-37 


XCTRCK 


CDTS# . SNPADS# 


SNPADS 
CRDY . CDTS# 


4 SNPADS# . CRDY# . CDTS# 
CRDY .CDTS 


SNPADS 


240957-38 | 
XDTSTRCK 
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, YSBGT# e YSMSWND# 


YSMSWND 


240957-39 | ——- | 240957-40 
XENBGT XENSWND 


SNPCYC# 


MSNPSTB# + SNPCYC 
XMSWNDO 


240957 -41 


SLFTST# + FSIOUT 


SLFTST . FSIOUT . CAHOLD# a SLFTST . FSIOUT# . CAHOLD 


: 240957-42 
XSTFAIL 
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RESET . TR4 


PBGT..YSMSWND.ENSWND . - 


, CRDY . PBGT# 


BGT . SNPDIS 
CRDY . PBGT . (YSNSWND.. ENSWND) # * 


BGT . YSNSWND. ENSWND 


CRDY#.. SNPDIS# . PBGT.. (YSNSWND. ENSWND)# 
| CRDY . PBGT. YSNSWND. ENSWND /ONESWND\ 
CRDY#.PEGT# 


240957-43 


LBRDY = CLEN1.BRDY 


CRDY# . LBRDY 


LBRDY# | ; _ CRDY#. LBRDY# © 
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YSWEHITM . YMEOC# . YPIPE 


YMEOC# 


te YSWEHITM . (YPIPE# + YMEOC) . MAOE 
YALLOC . (WMSWND . MHITMI)# 4 VALLOC . MKEN . YNOPIPE . YMEOC# 
| YALLOC . YMEOC . YPIPE 


CTCEND# 


YALLOC# . WMSWND . MHITMI 


ABORT _\ 


1 PCTCXFR JK 
\ MABORT 


240957-45 
YABORT 


# 
ce 240957-46 
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. MBRO.MLEN 1.MBRDY.WMSWND.MABORT#.(CLEN 1 2+CLEN4) 
+MBR1.MLEN 1.WMSWND.MABORT#.(CLEN 1 2+CLEN4) 
+MBR1.MLEN2.MBRDY.WMSWND.MABORT#.(CLEN 12+CLEN4) 
+MBR2.MLEN2.WMSWND.MABORT#.(CLEN 1 2+CLEN4) 
+MBR1.MLEN4.MBRDY.WMSWND. MABORT#.CLEN 1 2 
+(MBR2+MBR3).MLEN4. WMSWND.MABORT#.CLEN 1 2 
+MBR3.MLEN4.MBRDY.WMSWND.MABORT #.CLEN4 
+(MBR4+MBR5+MBR6+MBR7+MBR8).MLEN4.WMSWND 

.MABORT#.(CLEN 12+CLEN4) ef 


| 240957-47 
YCPUEOC fan 


YMEOC# + YPIPE . XLRDYSRC# 


¥ >) 

Re cnt 
ae ,YCEOC . YMEOC# 

| a _FYMEOC. YPIPE 

MRESET . Sg .XLRDYSRC# 


4 YCEOC# . YMEOC# 1p 
YMEOC . YPIPE x 
. XLRDYSRC . CACHE# SZ 


YMEOC.. (YPIPE# + XLRDYSRC . CACHE) | 


YMEOC . YPIPE 
» XLRDYSRC . CACHE# 


YMEOC . YPIPE 
_XLRDYSRC . CACHE# 


YMSEL# 


a? YCEOC#.YMEOC# 


| 240957-48 
YCPULEN 
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YMSWEND# 


YMSWEND# YMSWEND 


A 
: Z 


YMSWEND 


240957-49 


YENMSWND 


PXSAS = XSAS.ENXSAS.XSNPWB# 
PSWBAS = XSAS.ENXSAS.XSNPWB 


XSAS 
YENXSAS 


240957-50 
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PXSAS# + YALLOC# 


YMEOC# . YPIPE# 
YMEOC + YPIPE 


PXSAS . YALLOC 


YDRCTM 
DISWND 
YIMSWEND = MSWENDI.YALLOC#.DISWND# 


-YIMSWND 


MBRDY# 


MBRDY 
YMBRDY 
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YBGT . PXSAS 


PXSAS# 


HBASWB HBASWB 
MBREO PXSAS# . MHLDA@ nae 


SV 


PXSAS#.MHLDA _ 


YMBREQ 


240957-53 


MRESET 


° #SVSXd ° SOANA 
#Y4dXO13d ~ #NLIHSMSA 


MOOINA + #CIOHA) | 
MOOTHA + #CIOHN) * SYSXd 


O 


YNOPIPE 
PCTCXFR# . RSTRT MACE 
+ PCTCXFR# . RSTRT# . PXSAS . (MHOLD# + YMLOCK) 


{/YMADS, YBGT IF RSTRT#} 


PCTCXFR 
{/MSEL IF YMEOC# 
WMSWND . RSTRT#} 


(HOOTNA + 
)° (VN + YNA) 
SVSXd ° #WMLIHBMSA 


ONASMIM * #003NA 
{L98A “13SA ‘SGYAA/} 

{Loga ‘SaVAA/} 

144080 * SVEMSd 


#Q10HA 


{/SIGNAL} = SIGNAL ASSERTED 
YMEOC# YMEOC# 


| . 240957-54 
YMBTRCK 7 
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(YNOPIPE+MHLDA).YMEOC YMADS.MWR.(YNOPIPE+MHLDA) 
+YPIPE.YMEOC.MWR# +YPIPE.MWR.YMEOC 


+YNOPIPE#.YPIPE#.MHLDA# 


YMEOC#,(YNOPIPE+YPIPE+MHLDA) 
- #YMEOC.YPIPE.MWR 


: 240957-55 
-YMDOE 


 YNOPIPE.(MNA 
.YMADS#+WMNA) 


.WMSWND+YPIPE# 
.YMEOC 


YPIPE+ YMEOC#. YNOPIPE # 
\. #YMEOC#. YNOPIPE. [(MNA 
‘SYMADS#+WMNA).WMSWND]* 


ok 240957-56 
YMEMALE 
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MBRO.MLEN 1.MBRDY.WMSWND.MABORT# 
+MBR1.MLEN1.WMSWND.MABORT# 
+MBR1.MLEN2.MBRDY.WMSWND.MABORT # 


+MBR2.MLEN2.WMSWND.MABORT # 
+MBR3.MLEN4.MBRDY.WMSWND.MABORT#.TR4 
+MBR4.MLEN4.WMSWND.MABORT#.TR4 
+MBR7.MLEN 4. MBRDY.WMSWND.MABORT# 
+MBR8.MLEN4.WMSWND.MABORT# 
+YALLOC.MABORT 


{/YMFRZ IF YALLOC.MABORT} 


240957 -57 


YMEMEOC 


YMEOC# + YPIPE.L1 


Ze 


YMEOC. YPIPE. L1 
YMEOC# + YPIPE.L2 


YMEOC. (YPIPE# + L4) 
YMEOC . YPIPE.L1 


YMEOC. YPIPE.L3 
YMSEL# 


LEN#.(XLKCACHE#+MKEN#.XLRDYSRC) 
LEN.(XLKCACHE#+MKEN#.XLRDYSRC) 
XLKCACHE. (MKEN+XLRDYSRC#) 


240957-58 
YMEMLEN 
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YBGT.KLOCK# © YBGT.KLOCK 
-YALLOC# +YALLOC 
+HBASWB# +HBASWB 


240957-59 


YBGT.(SNPDIS+MMIO+ 
MWR.MCACHE+MWR# 
.XLRDYSRC#.RFO)# 


io 240957-60 
YMSNPSTB 
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‘ (YMEOC# + MBOFFI) . MBRDY # 
YMEOC . YPIPE. MBRDY 


YMEOC . MBRDY YMEOC . YPIPE . MBRDY 


(YMEOC# + MBOFF1) . MBROY# 


+ VMEOC . YPIPE. MBDRY# 
YMEOC.. YPIPE# . MBOFFI# 


YMEOC . MBRDY# 


YMEOC . YPIPE . MBDRY # 
# 
MECETIY eMEBR *YMEOC. YPIPE# . MBOFFIW 


(YMEOC# + MBOFFI) . MBRDY 


MBRDY # MBOFFI. MBRDY 


TR4.MBRDY . MBOFF! 
/CTCEND 


MBRDY # 
/CTCEND 


YMEOC . MBRDY# 


MRESET + MABORT 
(MBOFFI# + TR4#) . MBRDY 


YMEOC . MBRDY 


YMEOC# . MBRDY 
MBRDY# 


YMEOC# . MBRDY# 


MBRDY # 


240957 -61 
YRDYSTR 


YSWEHITM.MWR.YALLOC# 
+CTCDIS# 


+CTCDIS.(YSWEHITM+PCTCXFR) 


PCTCXFR# 


PCTCXFR 240957 -62 
YRSTRT 
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YMADS 
+ YMEOC 
+ YNOPIPE# 


YNOPIPE . YMADS# . MNA 
+ PIPE. YMEOC .MNA. YMADS# 


YMADS# . YMEOC# . YNOPIPE 


YWMNA 


240957-63 


MRESET 


ene vene YBGT.(SNPDIS+MMIO#+MWR 
‘SNPDIS#.MMIO.(MWR .-MCACHE)+YNOPIPE.SNPDIS# 
“MCACHE) #.(YMSWEND# .MMIO.(MWR.MCACHE)#. YMSWEND 
+ENMSWND)# +YALLOC. YMSWEND.ENMSWND 


YPIPE.(SNPDIS+MMIO#4+MWR 
.MCACHE).YMEOC#+YPIPE 
-SNPDIS#.MMIO.(MWR 
.MCACHE) #.YMSWEND 
-ENMSWND 


YSWEHITMI = YMSWEND.MHITMI.ENMSWND.(YALLOC#+YNOPIPE#.YPIPE#) 


YWMSWND 
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intel. 
PLD CODES 


TITLE AYMBTRCK 
PATTERN A 

REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 

DATE 2/4/91 
CHIP x01 85C22v10 


; This PLD contains the YMBTRCK state machine. 


prt ere ee ee eee ee eee eee PIN Declarations 


PIN 1 MCLK COMBINATORIAL ; 
PIN 2 MRESET COMBINATORIAL ; 
PIN 3 /WMSWND COMBINATORIAL ; 
PIN 4 /MBOFFI COMBINATORIAL ; 
PIN 5 /PXSAS COMBINATORIAL ; 
PIN 6 / PSWBAS COMBINATORIAL ; 
PIN 7 MHOLD COMBINATORIAL ; 
PIN 8 /MNA COMBINATORIAL ; 
PIN 9 /WMNA COMBINATORIAL ; 
PIN 10 /YMLOCK COMBINATORIAL ; 
PIN 11 /YSWEHITM COMBINATORIAL ; 
PIN 12 GND 
PIN 13 /PCTCXFR COMBINATORIAL ; 
PIN 14 /RSTRT COMBINATORIAL ;. 
PIN 15 /YMEOC COMBINATORIAL ; 
PIN 16 UNUSED registered ; 
PIN 17 /YBGT registered ; 
PIN 18 /YMADS registered ; 
PIN 19 /MAOE registered ; 
PIN 20 /YNOPIPE registered ; 
PIN 21 /YMSTR registered ; 
PIN 22 /YPIPE registered ; 
PIN 23 /YMSEL registered ; 
PIN 24 vcc 
port cr ee eee ee ee ee eee Boolean Equation Segment ------ 
EQUATIONS 
YNOPIPE := /MRESET * PXSAS * /MHOLD * YMEOC * YNOPIPE 
+ /MRESET * PXSAS * YMLOCK * YMEOC * YNOPIPE 
+ /MRESET * /PXSAS * /YMEOC * /YSWEHITM * YNOPIPE 
+ /MRESET * /YMEOC * /WMSWND * /YSWEHITM * YNOPIPE 
+ /MRESET * YMEOC * /YSWEHITM | * /PCTCXFR * YPIPE 
+ /MRESET * /PCTCXER * RSTRT * YMSTR * /MAOE 
+ /MRESET * /MNA * /WMNA * /YMEOC * /YSWEHITM * YNOPIPE 
+ /MRESET * MHOLD * /YMLOCK * /YMEOC * /YSWEHITM * YNOPIPE 
+ /MRESET * PXSAS * /MHOLD * /PCTCXFR * YMSTR * /MAOE 
+ /MRESET * PXSAS * YMLOCK * /PCTCXFR * YMSTR * /MAOE 
+ /MRESET * PXSAS * /MHOLD * /MBOFFI * /YMSTR * /MAOE 
+ /MRESET * PXSAS * /MHOLD * /YSWEHITM * /PCTCXFR * /YNOPIPE * /YPIPE 
* YMSTR 
+ /MRESET * PXSAS * YMLOCK * /YSWEHITM * /PCTCXFR * /YNOPIPE * /YPIPE 
* YMSTR “s 
YPIPE | := /MRESET * /YMEOC * YPIPE 
+ /MRESET * PXSAS * /MHOLD * MNA * /YMEOC * WMSWND * /YSWEHITM 
* YNOPIPE 
+ /MRESET * PXSAS * /MHOLD * WMNA * /YMEOC * WMSWND * /YSWEHITM 
* YNOPIPE 
+ /MRESET * PXSAS * MNA * YMLOCK * /YMEOC * WMSWND * /YSWEHITM 
* YNOPIPE 
+ /MRESET * PXSAS * WMNA * YMLOCK * /YMEOC * WMSWND * /YSWEHITM 
* YNOPIPE 
YMSTR >= /MRESET * YPIPE 


240957-65 
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MAOE 


YMADS 


YBGT 


YMSEL 


UNUSED 


/MRESET 
/MRESET 
_/MRESET 
/MRESET 
/MRESET 
/MRESET 
/MRESET 


+t+t+etett+ 


* 
* 
* 


* 
* 
* 


:=  /MRESET 


+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 


:= /MRESET 


+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
* YMSTR 

+ /MRESET 
* YMSTR 

/MRESET 


/MRESET 
/MRESET 
/MRESET 


+++ eH 


:= /MRESE 


/MRESET 
/MRESET 
/MRESET 


/MRESET 
/MRESET 
/MRESET 
/MRESET 
/MRESET 
/MRESET 
/MRESET 


t+etttett+eteest 


/MRESET 


/MRESET 


* 


* 
* 
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/YMEOC * YNOPIPE 
YSWEHITM * YNOPIPE 
/MHOLD * YMSTR 


* YMLOCK * YMSTR 


/MHOLD * /MBOFFI * /MAOE 
PCTCXFR * YMSTR * /MAOE 
RSTRT * YMSTR * /MAOE 


* /PCTCXER * RSTRT * 
/YMEOC * YPIPE 
YMLOCK * YMEOC * YNOPIPE | 

/YMEOC * /YSWEHITM * YNOPIPE 

/YSWEHITM * /PCTCXFR * YPIPE 

/YMEOC * /YMSTR * MAOE 

/YSWEHITM * /PCTCXFR * YMSTR 


* YMSTR * /MAOE 


YMLOCK * 

/MHOLD * /PCTCXFR * YMSTR * /MAOE 

YMLOCK * /PCTCXFR * YMSTR * /MAOE 

/MHOLD * /MBOFFI * /YMSTR * /MAOE 

PSWBAS * MBOFFI * /YMSTR * /MAOE 

/MHOLD * /YMLOCK * /YMEOC * /YNOPIPE * MAOE 

/MHOLD * /YMLOCK * YMEOC * /YPIPE * YMSTR * MAOE 

/PXSAS * YMLOCK * /YNOPIPE * /YPIPE * YMSTR * MAOE 

* PXSAS * /MHOLD * YMEOC * YNOPIPE 

PXSAS * YMLOCK * YMEOC * YNOPIPE 

/PCTCXFR * RSTRT * YMSTR * /MAOE 

PXSAS * /MHOLD * /PCTCXFR * YMSTR * /MAOE 

PXSAS * YMLOCK * /PCTCXFR * YMSTR * /MAOE 

PXSAS * /MHOLD * /MBOFFI * /YMSTR * /MAOE 

PXSAS * /MHOLD * /YSWEHITM * /PCTCXFR * /YNOPIPE * /YPIPE 
PXSAS * YMLOCK * /YSWEHITM + * /PCTCXFR * /YNOPIPE * /YPIPE' 


PSWBAS * MBOFFI * /YMSTR * /MAOE 

PXSAS * /MHOLD * MNA * WMSWND * Fier co * YNOPIPE 
PXSAS * /MHOLD * WMNA * WMSWND * /YSWEHITM * YNOPIPE 
PXSAS * MNA * YMLOCK * WMSWND * /YSWEHITM * YNOPIPE 


PXSAS * WMNA * YMLOCK * WMSWND * /YSWEHITM * YNOPIPE 


* PXSAS * /MHOLD * YMEOC * YNOPIPE 
PXSAS * YMLOCK * YMEOC * YNOPIPE 
PXSAS * /MHOLD * /MBOFFI * /YMSTR * /MAOE 


PSWBAS * MBOFFI * /YMSTR *: /MAOE | 

PXSAS * /MHOLD * MNA * WMSWND * _/YSWEHITM * YNOPIPE 

PXSAS * /MHOLD * WMNA * WMSWND * /YSWEHITM * YNOPIPE 
PXSAS * MNA * YMLOCK * WMSWND * /YSWEHITM * YNOPIPE 
PXSAS * WMNA * YMLOCK * WMSWND * /YSWEHITM * YNOPIPE 
PXSAS * YMLOCK * /YNOPIPE * /YPIPE * YMSTR * MAOE 

PXSAS * /MHOLD * /PCTCXFR * /RSTRT * YMSTR * /MAOE 

PXSAS * YMLOCK * /PCTCXFR * /RSTRT * YMSTR * /MAOE 

PXSAS * /MHOLD * /YSWEHITM * /PCTCXFR * /YNOPIPE * /YPIPE 


* YMSTR * MAOE 


_/MRESET * 


re | /MRESET * 


{YMEOC * YPIPE 
/YMEOC * /YSWEHITM * YNOPIPE 


+ /MRESET * /YSWEHITM * /PCTCXFR * YPIPE 
+ /MRESET * /YMEOC * WMSWND * PCTCXFR * /RSTRT * YMSTR * /MAOE 


= VCC 
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2-514 


intel. 


PATTERN A 
REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 

DATE 2/4/91 
CHIP x01 85C22V10 


This PLD contains 


AP-452 PRELIMINARY 


the YMEMLEN and YCPUEOC state machines 


PIN. Declarations 


PIN 1 MCLK COMBINATORIAL ; 
PIN: <2 MRESET COMBINATORIAL ; 
BIN; 2 /YMSEL COMBINATORIAL ; 
PIN 4 /YPIPE COMBINATORIAL ; 
PIN 5 /MABORT COMBINATORIAL ; 
PIN 6 /MBRDY COMBINATORIAL ; 
PIN 7 /WMSWND COMBINATORIAL ; 
PIN 8 XLRDYSRC COMBINATORIAL ; 
PIN 9 /XLKCACHE COMBINATORIAL ; 
PIN 10 /MKEN COMBINATORIAL ; 
PIN. - Lb LEN COMBINATORIAL ; 
PIN 12 GND : 
PIN 13 / CACHE COMBINATORIAL ; INPUT 
PIN 14 /YMEOC COMBINATORIAL ; INPUT 
PIN 15 /SVRO COMBINATORIAL ; INPUT 
PIN 16 /SVR1 COMBINATORIAL ; INPUT 
PIN 17 /SVR2 COMBINATORIAL ; INPUT , 
PIN 18 /SVR3 COMBINATORIAL ; INPUT 
PIN 19 /YCEOC registered ; 
PIN 20 /SVLO — registered ; 
PIN 21 /SVL1 registered ; 
PIN 22 /SVCO registered ; 
PIN 23 /SVCL registered ; 
PIN 24 Vcc 
prt re eee eee eee ee ee eee Boolean Equation Segment ------ 
EQUATIONS 
SVL1 = /YMEOC * /MRESET * SVL1 
+ YPIPE * LEN * /XLKCACHE * /MRESET * SVL1 
+ YMSEL * LEN * /MRESET * /SVL1 * /SVLO 
+ YPIPE * LEN * XLRDYSRC * /MKEN * /MRESET * SVL1 
+ YPIPE * YMEOC * LEN * /XLKCACHE * /MRESET * SVLO 
+ YMSEL * /XLRDYSRC * XLKCACHE * /MRESET * /SVL1 * /SVLO 
+ YMSEL * XLKCACHE * MKEN * /MRESET * /SVL1 * /SVLO 
+ YPIPE * YMEOC * LEN * XLRDYSRC * /MKEN * /MRESET * SVLO 
SVLO = YMSEL * /XLRDYSRC * XLKCACHE * /MRESET * /SVL1 * /SVLO 
+ YMSEL * XLKCACHE * MKEN * /MRESET * /SVL1 * /SVLO 
+ /YMEOC * /MRESET * SVLO 
+ YPIPE * /LEN * /XLKCACHE * /MRESET * SVLO 
+ YMSEL * /LEN * /MRESET * /SVL1 * /SVLO 
+ YPIPE * YMEOC * /LEN * /XLKCACHE * /MRESET * SVL1 
+ YPIPE * /LEN * XLRDYSRC * /MKEN * /MRESET * SVLO 
+ YPIPE * YMEOC * /LEN * XLRDYSRC * /MKEN * /MRESET * SVL1 
SVC1 = /YMEOC * /YCEOC * /MRESET * SVC1 
+ YMSEL * XLRDYSRC * /MRESET * /SVC1l * /SVCO 
+ YPIPE * YMEOC * /CACHE * XLRDYSRC * /MRESET * SVC1 
+ YPIPE * YMEOC * /CACHE * XLRDYSRC * /MRESET * SVCO 
SVCO = /YMEOC * /MRESET * SVCO 


+ /YMEOC * YCEOC * /MRESET * SVC1 
+ YPIPE * /XLRDYSRC * /MRESET * SVCO 


rad 
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YPIPE * YMEOC * /XLRDYSRC * /MRESET * SVC1 
YMSEL * CACHE * /MRESET * /SVC1 * /SVCO 
YMSEL * /XLRDYSRC * /MRESET * /SVC1 * /SVCO 


:= SVR3 * /SVR2 * /SVR1 * SVL1 * SVLO * SVCl * WMSWND 
* /MABORT * /MRESET * /YCEOC 

+ SVR3 * /SVR1 * /SVRO * SVL1 * SVLO * SVCl * WMSWND 
* /MABORT * /MRESET * /YCEOC 

+ /SVR3 * /SVR2 * SVR1 * /SVRO * SVL1 * /SVLO * SVC1 
* WMSWND * /MABORT * /MRESET * /YCEOC 

+ /SVR3 * /SVR2 * /SVR1 * SVRO * /SVL1 * SVLO * SVC1 
* WMSWND * /MABORT * /MRESET * /YCEOC 

+ /SVR3 * SVR2 * SVR1 * /SVRO * SVL1 * SVLO * SVC1 
* WMSWND * /MABORT * /MRESET * /YCEOC 

+ /SVR3 * /SVR2 * SVR1 * SVRO * SVL1 * SVLO * SVC1 
* WMSWND * /MABORT * /MRESET * /YCEOC 

+ /SVR3 * /SVR2 * SVR1 * /SVRO * SVL1 * SVCl * /SVCO 
* WMSWND * /MABORT * /MRESET * /YCEOC 

+ /SVR3 * SVR2 * /SVRO * SVL1 * SVLO * SVCl * /SVCO 
* WMSWND * /MABORT * /MRESET * /YCEOC 

+ /SVR3 * /SVR2 * /svR1 * /SVL1 * SVLO * SVCl * MBRDY 
* WMSWND * /MABORT * /MRESET * /YCEOC 
_ + /SVR3 * SVR2 * /SVRO * SVL1 * SVLO * SVC1 * MBRDY 

: WMSWND * /MABORT * /MRESET * /YCEOC 

+ /SVR3 * /SVR2 * /SVR1 * SVRO * SVL1 * /SVLO * SVCl 
* MBRDY * WMSWND * /MABORT * /MRESET * /YCEOC 
+ /SVR3 * /SVR2 * /SVR1 * SVRO * SVL1 * SVCl * /SVCO 
* MBRDY * WMSWND * /MABORT * /MRESET * /YCEOC 
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TITLE BYRDYSTR 


PATTERN A 
REVISION 2.0 


AUTHOR 
COMPANY INTEL 
2/4/91 


DATE 
CHIP 


; PIN 
PIN 
PIN 
PIN 
PIN 
PIN 
PIN 
PIN 
PIN 
PIN 
PIN 
PIN 
PIN 
PIN 
PIN 


ISIC SILAS 


x01 85C22V10 


This PLD contains the YRDYSTR, YRDYSTR, and YMEMEOC state machines. 


5 HH 


EQUATIONS 


SVR3 


SVR2 


SVRL 


SVRO 


+ 
+ 
+ 
+ 
+ 


+ 
+ 
+ 
+ 


t++ettet 


+ 
+ 


/YALLOC 


/MABORT 
/MBRDY 
/WMSWND 
/MBOFFI 
/YMSEL 
/YPIPE 
GND 
/SVL1 
/SVLO 
/SVR3 
/SVR2 
/SVR1 
/SVRO 
/YMEOC1 
/YMEOC 


AP-452 


PIN Declarations 
COMBINATORIAL ; 
COMBINATORIAL ; 


COMBINATORIAL ; 


COMBINATORIAL ; 


COMBINATORIAL ; 
COMBINATORIAL ; 

COMBINATORIAL ; 

COMBINATORIAL ; 
COMBINATORIAL ; 
COMBINATORIAL ; 


COMBINATORIAL ; INPUT 
COMBINATORIAL ; INPUT 
registered ; 
registered ; 
registered ; 
registered ; 
registered ; 
registered ; 

registered ; 
registered ; 


/MRESET 
/MRESET 
/MRESET 
/MRESET 
/MRESET 
/MRESET 


/MRESET 
/MRESET 
/MRESET 
/MRESET 
/MRESET 


/MRESET 
/MRESET 
/MRESET 
/MRESET 
/MRESET 
/MRESET 
/MRESET 


/MRESET 
/MRESET 
/MRESET 


registered ; 


Boolean Equation Segment 


/YMEOC * /MBRDY * /MABORT * SVR3- 
MBRDY * /MABORT * /MBOFFI * SVR2 
/MBRDY * /MABORT * SVR3 * SVR2 

MBRDY * /MABORT * SVR2 * SVR1 

/YMEOC * /MABORT * SVR3 * SVRO 

MBRDY * /MABORT * /TR4 * /SVR3 * SVR2 


/MBRDY * /MABORT * SVR2 

/MABORT * SVR2 * SVR1 

/YMEOC * MBRDY * /MABORT * SVR1 
MBRDY * /MABORT * MBOFFI * SVR1 
MBRDY * /MABORT * SVR1 * SVRO 


/MABORT * SVR1 * SVRO 

/YMEOC * /MBRDY * /MABORT * ‘svrl 

/MBRDY * /MABORT * MBOFFI * SVR1 

/MBRDY * /MABORT * SVR2 * SVR1 | 

/YMEOC * MBRDY * /MABORT * /SVR3 * SVRO 

MBRDY * /MABORT * MBOFFI * /SVR3 * SVRO 

/YMEOC * MBRDY * /MABORT * SVR3 * /SVR2 * /SVRO 


MBRDY * /MABORT * /MBOFFI * SVR3 


MBRDY * /MABORT * SVR3 * /SVR2 
/YMEOC * /MBRDY * /MABORT * SVRO 
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/MRESET * /MBRDY * /MABORT * SVR1 * SVRO 

/MRESET * /MBRDY * /MABORT * MBOFFI * /SVR3 * SVRO 

/MRESET * YPIPE * YMEOC * MBRDY * /MABORT * /SVR1 * SVRO 
/MRESET * YMSEL * MBRDY * /MABORT * /SVR2 * /SVR1 * /SVRO 
/MRESET * MBRDY * /MABORT * MBOFFI * /SVR2 * /SVR1 * /SVRO 
/MRESET * YPIPE * YMEOC * MBRDY * /MABORT * /SVR2 * SVR1_ 
* /SVRO | 


t++tttet 


CTCEND = /MRESET * MBRDY * /MABORT * MBOFFI * SVR3 * SVR2 
| + /MRESET * MBRDY * /MABORT * MBOFFI * TR4 * SVR2 * /SVR1 


YMEOC = /MRESET * MABORT * YALLOC * /YMEOC * /SV 7 

+ /MRESET * /SVR3 * /SVR2 * SVR1 * /SVRO *. SVL * /SVLO 
* WMSWND * /MABORT * /YMEOC * /SV 

+ /MRESET * /SVR3 * /SVR2 * /SVR1 * SVRO * /SVL1 * SVLO. 
* WMSWND * /MABORT * /YMEOC * /SV 

+ /MRESET * SVR3 * /SVR2 * /SVR1 * SVRO * SVL1 * SVLO 
* WMSWND * /MABORT * /YMEOC * /SV - 

+ /MRESET * SVR3 * /SVR2 * /SVR1 * SVL] * SVLO * TR4 * WMSWND 
* /MABORT * /YMEOC * /SV 

+ /MRESET * /SVR3 * /SVR2 * /SVR1 * /SVLL * SVLO * MBRDY 

; WMSWND * /MABORT * /YMEOC * /SV | 

+ /MRESET * /SVR3 * /SVR2 *./SVR1 * SVRO * SVL1 * /SVLO 
* MBRDY * WMSWND * /MABORT * /YMEOC * /SV 

+ /MRESET * SVR3 * SVR2 * « /SVR1 * /SVRO * SVL1 * SVLO 
* MBRDY * WMSWND * /MABORT * /YMEOC * /SV 

‘+ /MRESET * SVR2 * /SVR1 * /SVRO * SVL1 * SVLO * TR4 * MBRDY 

; WMSWND * /MABORT * /YMEOC * /Sv 7 


SV = /MRESET * YMEOC 


/YMFRZ = MRESET 
+ /MABORT 
+ /YALLOC 
+ YMEOC 
+ SV 


YMEOC1 = /MRESET * MABORT * YALLOC * /YMEOC1 *- /SV ms 

+ /MRESET * /SVR3 * /SVR2 * SVR1 * /SVRO * SVL1 * /SVLO 
* WMSWND * /MABORT * /YMEOC] * /SV ae 

+ /MRESET * /SVR3 * /SVR2 * /SVR1 * SVRO * /SVL1 * SVLO_ 
* WMSWND * /MABORT * /YMEOC] * /SV : 

+ /MRESET * SVR3 * /SVR2 * /SVR1 * SVRO * SVL1 * SVLO 

: WMSWND * /MABORT * /YMEOC1] * /SV ~ 

+ /MRESET * SVR3 * /SVR2 * /SVR1 * SvLI * SVLO * TR4 * : WMSWND 
* /MABORT * /YMEOC1 * /SV 

+ /MRESET * /SVR3 * /SVR2 * /SVR1 * /SVL1 * SVLO * MBRDY 
* WMSWND * /MABORT * /YMEOC] * /SV 

+ /MRESET * /SVR3 * /SVR2 * /SVR1 * SVRO * SVL1 * /SVLO 
* .MBRDY * WMSWND * /MABORT * /YMEOC] * /SV 

+ /MRESET * SVR3 * SVR2 * /SVR1 * /SVRO * SVL1 * SVLO_ 

* MBRDY * WMSWND * /MABORT * /YMEOC1 * /SV 

+ /MRESET * SVR2 * /SVR1 * /SVRO * SVL1 * SVLO * : TR4 * MBRDY 
* WMSWND * /MABORT * /YMEOC1 * /SV 
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PATTERN A 

REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 

DATE 2/5/91 


CHIP x01 85C224 


; This PLD contains the YABORT, YRSTRT, and YMEMDOE state machines. 


per eer ee ee eee eee eee eee Pin Definitions ------------- were ee eee 


PIN 1 MCLK 

PIN 2 MRESET 
PIN 3 WMSWND 
PIN 4 YSWEHITM 
PIN 5 YALLOC 
PIN 6 YPIPE 
PIN 7 YNOPIPE 
PIN 8 YMEOC 
PIN 9 MHITMI 


PIN 16 YMADS 
PIN 23: CTCDIS 


PIN 18 RSTRT 

PIN 19 YMDOE 

PIN 20 PCTCXFR 

PIN 21 TRIABORT 

PIN 22 ° MABORT . | 

PIN 17 SV ;Swapped pins 23 and 17 to fit 85C224 


EQUATIONS 
/RSTRT.D := /MRESET * /PCTCXFR * /CTCDIS 
+ /MRESET * /PCTCXFR * /RSTRT 
+ /MRESET * /YSWEHITM * /CTCDIS * RSTRT 
+ /MRESET * /YSWEHITM * YWR * YALLOC * RSTRT 
RSTRT.CLKF = MCLK 
RSTRT.RSTF = GND 
RSTRT.SETF = GND 
RSTRT.TRST = VCC 


/YMDOE.D := /MRESET * YWR * /YPIPE * /YMEOC 
+ /MRESET * /YNOPIPE * YMEOC * /YMDOE 
+ /MRESET * MHLDA * YMEOC * /YMDOE - 
+ /MRESET * /YPIPE * YMEOC * /YMDOE 
+ /MRESET * /YMADS * YWR * /YNOPIPE * YMDOE 
+ /MRESET * /YMADS * YWR * MHLDA * YMDOE 


YMDOE.CLKF = MCLK 
YMDOE.RSTF = GND 
YMDOE.SETF = GND 
YMDOE: TRST = VCC 
/PCTCXFR.D := /MRESET * YALLOC * /MABORT 
+ /MRESET * /YSWEHITM * /MAOE * PCTCXFR 
+ /MRESET * /MHITMI * /WMSWND * /MABORT 
+ /MRESET * CTCEND * /PCTCXFR * MABORT ; 
+ /MRESET * /PCTCXFR * MABORT * /SV 
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+ /MRESET * /YSWEHITM * /YPIPE * YMEOC * PCTCXFR 
+ /MRESET * /YPIPE * /YMEOC * /YALLOC * PCTCXFR 
+ /MRESET * /YNOPIPE * YMEOC * /YALLOC * /MKEN * PCTCXFR 
PCTCXFR.CLKF = MCLK 
PCTCXFR.RSTF 
PCTCXFR.SETF 
PCTCXFR.TRST 


/TRIABORT.D := /MRESET * /YPIPE * /YMEOC * /YALLOC * PCTCXFR * /MHLDA 

+ /MRESET * /YNOPIPE * YMEOC * /YALLOC * /MKEN * PCTCXFR 
* /MHLDA | | 

+ /MRESET * /YSWEHITM * /MAOE * YPIPE * PCTCXFR * /MHLDA 
+ /MRESET * /YSWEHITM * /MAOE * /YMEOC * PCTCXFR * /MHLDA 
+ /MRESET * /YMEOC * /PCTCXFR * MABORT * /SV * /MHLDA 

TRIABORT.CLKF = MCLK 

TRIABORT.RSTF = GND 

TRIABORT.SETF = GND 

TRIABORT.TRST = /MHLDA 


/YPIPE * /YMEOC * /YALLOC * PCTCXFR 
/YNOPIPE * YMEOC * /YALLOC * /MKEN * PCTCXFR 


/MABORT.D := /MRESET * 
* 
* /YSWEHITM * /MAOE * YPIPE * PCTCXFR | 
* 
* 


/MRESET 


/YSWEHITM * /MAOE * /YMEOC * PCTCXFR 


/YMEOC * /PCTCXFR * MABORT * /SV 


MABORT . CLKF 
MABORT.RSTF 
MABORT.SETF = 
MABORT.TRST 


/SV.D := MABORT * /SV 
+ PCTCXFR 

SV.CLKF = MCLK 

SV.RSTF = GND 

SV.SETF = GND- 

SV.TRST = VCC 
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PATTERN A 
REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 

DATE 2/4/91 


CHIP x01 85C224 
This PLD contains the XASTB, XSTFAIL, and XDTSTRCK state machines 


wr re ee ee re eee eee Pin Declarations ----------------- 


PIN 1 CLK 
PIN 2 RESET 
PIN 3 CADS 
PIN 4 CDTS 
PIN 5 SNPADS 
PIN 6 CWR 
PIN 7 YSBGT 
PIN 8 CRDY 
PIN 9 CAHOLD 


PIN 16 LRDYSRC 
PIN 17 SV2 

PIN 18 STFAIL 
PIN 19 SV1 

PIN 20 XSNPWB 
PIN 21 WSDTS. 
PIN °§ 22 XAS 


; OE control inverted during design conversion. 
EQUATIONS 


LRDYSRC.D := RDYSRC 
LRDYSRC.CLKF = CLK 
LRDYSRC.RSTF = GND 
LRDYSRC.SETF = GND 
/LRDYSRC.TRST = OEx 


/SV2.D := /RESET * FSIOUT * /SLFTST * STFAIL * SV2 
SV2.CLKF = CLK 

SV2.RSTF = GND 

SV2.SETF = GND 

/SV2.TRST = OEx 


/STFAIL.D := /RESET * FSIOUT * /CAHOLD * /SLFTST * STFAIL * SV2 
STFAIL.CLKF = CLK 

STFAIL.RSTF = GND 

STFAIL.SETF = GND 

/STFAIL.TRST = OEx 


/SVL.D := /RESET * CDTS * CRDY * /SV1 

+ /RESET * CDTS * WSDTS * /SV1 

+ /RESET * /SNPADS * XSNPWB * SV1 

+ /RESET * /CDTS * CRDY * /WSDTS * XSNPWB 
SV1.CLKF = CLK 
SV1.RSTF = GND 
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SV1.SETF = GND 
/SV1.TRST = OEx 


/XSNPWB.D := /RESET * CRDY * /XSNPWB 
+ /RESET * /CDTS * WSDTS * /SV1 
XSNPWB.CLKF = CLK 
XSNPWB.RSTF = GND 
XSNPWB.SETF = GND 
/XSNPWB.TRST = OEx 


/WSDTS.D := /RESET * CRDY * /XSNPWB 

+ /RESET * /CDTS * XSNPWB 

+ /RESET * /WSDTS * /SV1 . 

+ /RESET * SNPADS * CRDY * /WSDTS 
WSDTS.CLKF = CLK | 
WSDTS.RSTF = GND 
WSDTS.SETF = GND 
/WSDTS.TRST = OEx 


/XAS.D := /RESET * SNPADS * YSBGT * /XAS 
+ /RESET * /CDTS * CWR * XAS 
+ /RESET * /CADS * /CWR * XAS 
XAS.CLKF = CLK 
XAS.RSTF = GND 
XAS.SETF = GND 
/XAS.TRST = OEx 


' 940957-74 
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TITLE EBGTKWN 


PATTERN 

REVISION 1.0 
AUTHOR 

COMPANY INTEL 
DATE 


CHIP INTEL 85C224 


: This PLD contains the XBGTKWND, XCNA and XENBGT state machines 


PIN 1 CLK 
PIN 2 RESET 
PIN 3 YSBGT 
PIN 4 CRDY 
PIN S| C8LDRV 
PIN 6 TR4 
PIN 7 NC5 
PIN 8 NC6 
PIN 9 WCPLB 


PIN 17 ENBGT 


PIN 21 ~ CS5BGT 
PIN 22 BGT 


EQUATIONS 


/CKENLC.D := /RESET * YSBGT * CRDY * /BGT * /KWEND 
+ /RESET * CRDY * ENBGT * /BGT * /KWEND 

CKENLC.CLKF = CLK 

CKENLC.RSTF = GND 

CKENLC.SETF = GND 

/CKENLC.TRST = OE 


ENBGT.D := /RESET * /YSBGT 
ENBGT.CLKF = CLK 

ENBGT.RSTF = 
ENBGT.SETF = GND 
/ENBGT.TRST = OE 


/CNA.D := /RESET * CRDY * /CNA 


+ /RESET * /YSBGT * WCPLB * CNADIS 
+ /RESET * /YSBGT * WCPLB * /CNA 
+ /RESET * /PBGT * WCPLB * /CNA 
+ /RESET * /BGT * WCPLB * CNADIS * CNA 
CNA.CLKF = CLK 
CNA.RSTF = GND 
CNA.SETF = GND 


/CNA.TRST = OE 


/PBGT.D := /RESET * CRDY * /PBGT 
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+ /RESET * /YSBGT * CRDY * /ENBGT.* /BGT * /KWEND 
PBGT.CLKF = CLK | | | 
PBGT.RSTF = GND 

PBGT.SETF = GND 

/PBGT.TRST = OE 

/KWEND.D := /RESET * /BGT * KWEND 

/RESET * /CRDY * /PBGT 
/RESET * YSBGT * CRDY * /BGT 
/RESET * CRDY * ENBGT * (ECE 
+ RESET * TRG © | 
KWEND.CLKF = CLK - 

KWEND.RSTF = GND 

-KWEND.SETF = GND 

/KWEND.TRST = OE 


+++ il 


/CSBGT.D := /RESET * /BGT * KWEND 
/RESET * /CRDY * /PBGT 
/RESET * YSBGT * CRDY * /BGT 
/RESET * CRDY * ENBGT * /BGT 
/RESET * /YSBGT * /CRDY * /ENBGT * /BGT 
/RESET * /YSBGT * /ENBGT * CSBGT * KWEND * PBGT 
+ RESET * /C8LDRV 
CSBGT.CLKF = CLK 
C5BGT.RSTF = GND 
CSBGT.SETF = GND 
/C5SBGT.TRST = OE 
/BGT.D := /RESET * /BGT * KWEND 
/RESET * /CRDY * /PBGT | 
/RESET * YSBGT * CRDY * /BGT 
/RESET * CRDY * ENBGT * /BGT 
/RESET ° ASBCT * /CRDY * /ENBGT * /BGT 
/RESET * /YSBGT * /ENBGT * C5BGT * KWEND * PBGT 
BGT.CLKF = CLK 
BGT.RSTF = GND 
BGT.SETF = GND 
/BGT.TRST = OE 


a ae | 
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PATTERN A 
REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 

DATE 2/4/91. 


CHIP x01 85C224 


; This PLD contains the XBRDY, XMSWNDO, and XCTRCK state machines. 


pr ee eee ee eee ee Pin Declarations ---------------- 


PIN 1 CLK 

PIN 2 RESET 
PIN 3 CLEN1 
PIN 4 WSDTS 
PIN 5 YSCEOC 
PIN 6 SNPCYC 
PIN 7 MSNPSTB 
PIN 8 CKENLC 
PIN 9 MKEN 
PIN 13 OEx 


PIN 17 SV 

PIN 18 PNDCEOC 
PIN 19 ENBRDY 
PIN 20 BRDY 
PIN 21 BRDY1 
PIN 22 MSWNDO 


EQUATIONS 


/CKEN = CKENLC * /MKEN 
+ /CKENLC * /CKEN 
+ /MKEN * /CKEN 

CKEN.TRST = VCC 


/SV.D := /RESET * BRDY * /SV 
+ /RESET * CLEN1 * /SV 
+ /RESET * /YSCEOC * BRDY * ENBRDY * /PNDCEOC 
+ /RESET * /YSCEOC * CLEN1 * ENBRDY * /PNDCEOC 
SV.CLKF = CLK 
SV.RSTF = GND 
SV.SETF = GND 
/SV.TRST = OEx 


/PNDCEOC.D := /RESET * /SV 

+ /RESET * BRDY * /PNDCEOC 

+ /RESET * CLEN1 * /PNDCEOC 
/YSCEOC * ENBRDY * /PNDCEOC 
/YSCEOC * /ENBRDY * PNDCEOC 


f- 

™~ 
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PNDCEOC.CLKF = CLK 
PNDCEOC.RSTF = GND 
PNDCEOC.SETF = GND 
/PNDCEOC.TRST = OEx 


YSCEOC * PNDCEOC 

/ENBRDY * PNDCEOC 

YSCEOC * /BRDY * /CLEN1 
/YSCEOC * BRDY * /ENBRDY 


/ENBRDY . D 


+++ 
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+ /YSCEOC * CLEN] * ./ENBRDY 
+ /BRDY * /CLEN1 * ENBRDY * /PNDCEOC 
+ RESET 

ENBRDY.CLKF = CLK 

ENBRDY.RSTF = GND 

ENBRDY.SETF = GND 

/ENBRDY.TRST = OEx 


/BRDY.D := /RESET * CLEN1 * /BRDY 

+ /RESET * /PNDCEOC * /WSDTS * BRDY 

+ /RESET * PESEEOS * Ley * /WSDTS * BRDY. 
BRDY.CLKF = CLK. . ca | 
BRDY.RSTF = GND 
‘BRDY.SETF = GND 
/BRDY.TRST = OEx | 


/BRDY1.D := /RESET * CLEN1 * /BRDY1 
+ /RESET * /PNDCEOC * /WSDTS * BRDY1 
+ /RESET * /YSCEOC * /ENBRDY * /WSDTS * BRDY1 
BRDY1.CLKF = CLK 
BRDY1.RSTF = GND 
BRDY1.SETF = GND 
/BRDY1.TRST = OEx 


/MSWNDO = /RESET * /SNPCYC 
+ /RESET * MSNPSTB * /MSWNDO 
MSWNDO.TRST = VCC 
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PATTERN A 
REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 
DATE 2/5/91 


CHIP x01 85C224 


This PLD contains the YMEMLEN state machine 


prt tt eee ee eee eee Pin Declarations ---------------- 


PIN 1 YMALE 
PIN 2 YMAOE 
PIN 3 MHLDA 
PIN 4 KCACHE 
PIN 5 CWR 

PIN 6 CMIO 
PIN 7 CDC 

PIN 8 LLEN 
PIN 9 CSMTHIT 


PIN 10 YSNPDIS 
PIN 11 NC1l 
PIN 13 NC2 
PIN 14 YNOSWNDI 
PIN 23 YWRI 


PIN Zi MCACHE 
PIN 22 YNOSWND 


EQUATIONS 


/YWR = YMALE * /CWR 
+ /YMALE * /YWRI 
+ /CWR * /YWRI 

YWR.TRST = VCC 


/MTHIT = /CSMTHIT * /MWR * MHLDA 
MTHIT.TRST = MHLDA : 


/MLEN = YMALE * /LLEN * /YMAOE 
+ /YMALE * /MLEN * /YMAOE 

7 + -/LLEN * /MLEN * /YMAOE 

MLEN.TRST = /YMAOE 


/MDC = YMALE * /CDC * /YMAOE 
+ /YMALE * /MDC * /YMAOE 
+ /CDC * /MDC * /YMAOE 
MDC.TRST = /YMAOE 


/MMIO = YMALE * /CMIO * /YMAOE 
+ /YMALE * /MMIO * /YMAOE 
+ /CMIO * /MMIO * /YMAOE 
MMIO.TRST = /YMAOE 


/MWR = YMALE * /CWR * /YMAOE 
+ /YMALE * /MWR * /YMAOE 
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+ /CWR * /MWR * /YMAOE 
MWR.TRST = /YMAOE 


/MCACHE = YMALE * /KCACHE * /YMAOE 
+ /YMALE * /MCACHE * /YMAOE 
+ /KCACHE * /MCACHE * /YMAOE 
MCACHE.TRST = /YMAOE 


/YNOSWND = YMALE * /YSNPDIS 

YMALE * /CMIO 

YMALE * CWR * /KCACHE 
/YMALE * /YNOSWNDI 
/YSNPDIS * /YNOSWNDI 
/CMIO * /YNOSWNDI 

+ CWR * /KCACHE * /YNOSWNDI 
YNOSWND.TRST = VCC . 


t++etet 
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PATTERN A 
REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 

DATE 2/5/91 


CHIP x01 85C224 


; This PLD generates the memory bus byte enables (MBEs) 


PIN 1 LBEO 
PIN 2 LBE1l 
PIN 3 LBE2 
PIN 4 LBE3 
PIN 5 LBE4 
PIN 6 LBES 
PIN 7 LBE6 
PIN 8 LBE7 
PIN 9 RDYSRC 


EQUATIONS 


/MBE7 = /YMALE * /LBE7 * /YMAOE 
7 + /YMALE * /KCACHE * /RDYSRC * /YMAOE 
+ YMALE * /MBE7I * /YMAOE 
+ /LBE7 * /MBE7I * /YMAOE 
+ /KCACHE * /RDYSRC * /MBE7I * /YMAOE 
MBE7.TRST = /YMAOE 


/MBE5 = /YMALE * /LBES * /YMAOE 
+ /YMALE * /KCACHE * /RDYSRC * /YMAOE 
+ YMALE * /MBES * /YMAOE 
+ /LBES5 * /MBES * /YMAOE 
_ + /KCACHE * /RDYSRC * /MBES * /YMAOE 
MBES.TRST = /YMAOE 


/YMALE * /LBE4 * /YMAOE 
/YMALE * /KCACHE * /RDYSRC * /YMAOE 
YMALE * /MBE4 * /YMAOE | 

+ /LBE4 * /MBE4 * /YMAOE 

+ /KCACHE * /RDYSRC * /MBE4 * /YMAOE 
MBEG.TRST = /YMAOE : | 


/MBE4 


++ il 


/MBE3 = /YMALE * /LBE3 * /YMAOE 
/YMALE * /KCACHE * /RDYSRC * /YMAOE 


YMALE * /MBE3 * /YMAOE 


++ il 
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+ /LBE3 * /MBE3 * /YMAOE 
+ /KCACHE * /RDYSRC * /MBE3 * /YMAOE 
MBE3.TRST = /YMAOE 


/MBE2 = /YMALE * /LBE2 * /YMAOE 
+ /YMALE * /KCACHE * /RDYSRC * /YMAOE 
+ YMALE * /MBE2 * /YMAOE | 
+ /LBE2 * /MBE2 * /YMAOE 
+ /KCACHE * /RDYSRC * /MBE2 * /YMAOE 
MBE2.TRST = /YMAOE 


/MBEL = /YMALE * /LBE1] * /YMAOE 


+ /YMALE * /KCACHE * /RDYSRC * /YMAOE 


+ YMALE * /MBE1 * /YMAOE 

+ /LBE] * /MBE1 * /YMAOE 

+ /KCACHE * /RDYSRC * /MBE1 * /YMAOE 
MBEL.TRST = /YMAOE 


/MBEO = /YMALE * /LBEO * /YMAOE | 
+ /YMALE * /KCACHE * /RDYSRC * /YMAOE 
+ YMALE * /MBEO * /YMAOE 
+ /LBEO * /MBEO * /YMAOE 
+ /KCACHE * /RDYSRC * /MBEO * /YMAOE 
MBEO.TRST = /YMAOE. 


/MBE6 = /YMALE * /LBE6 * /YMAOE 

| + /YMALE * /KCACHE * /RDYSRC * /YMAOE 
+ YMALE * /MBE6I * /YMAOE 
+ /LBE6 * /MBE6I * /YMAOE 


+ /KCACHE * /RDYSRC * /MBE6I * /YMAOE 


MBE6.TRST = /YMAOE 
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PATTERN A 
REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 

DATE 2/4/91 


CHIP x01 85C224 


This PLD contains the YMALE, YMBRDY, YWMNA, 
and YIMSWND state machines. 


YIMSWND 
YDRCTM 
NC1_ 
YMALE 
DISWND 
WMNA 
YMBRDY 

PIN NC2 


EQUATIONS 


/YIMSWND = /MSWNDI * YALLOC * DISWND 
YIMSWND.TRST = VCC 


/YDRCTM.D := /MRESET * /DISWND 
+ /MRESET * YMEOC * YPIPE * /YDRCTM 
YDRCTM.CLKF = MCLK 
YDRCTM.RSTF = GND 
-YDRCTM.SETF = GND 
/YDRCTM.TRST = OE 


NC1L.D := VCC 
NC1.CLKF = MCLK 
NC1L.RSTF = GND 
NC1.SETF = GND 
/NCL.TRST = OE 


/YMALE.D := /MRESET * /YPIPE * /YMALE 
/MRESET * /YBGT * YMALE 
/MRESET * YNOPIPE * YMEOC * /YMALE 
/MRESET * WMSWND * YMEOC * /YMALE 
/MRESET * /YMADS * WMNA * YMEOC * /YMALE 
/MRESET * MNA * WMNA * YMEOC * /YMALE 
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YMALE.CLKF = MCLK 
YMALE.RSTF = GND 
YMALE.SETF = GND 
/YMALE.TRST = OE. 


/DISWND.D := /MRESET * /DISWND * YDRCTM 
+ /MRESET * /PXSAS * /YALLOC * YDRCTM 
DISWND.CLKF = MCLK 
DISWND.RSTF = GND 
DISWND.SETF = GND 
/DISWND.TRST = OE 


/WMNA.D := AlaeSer * /YNOPIPE * YMADS * YMEOC * /WMNA 
+ /MRESET * /YNOPIPE * YMADS * /MNA * WMNA 
+ /MRESET * /YPIPE * YMADS * PN * _ /TMEOC * WMNA 
WMNA. CLKF = MCLK 
WMNA.RSTF = GND 
WMNA.SETF = GND 
/WMNA.TRST = OE 


/YMBRDY.D := /MRESET * /MBRDY 
YMBRDY.CLKF = MCLK 
YMBRDY.RSTF = GND 

YMBRDY.SETF = GND 

/YMBRDY .TRST = OE 


NC2 = VCC 


NC2.TRST = VCC 
240957-84 
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PATTERN A 
REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 

DATE 2/4/91 


CHIP x01 856224 


This PLD contains the YALLC, YMEMLOCK, YSNPSTB, 
and YMBREQ state machines. 


owe we we we 


PIN 1 MCLK 

PIN 2 MRESET 
PIN 3. MKEN 

PIN 4 MHLDA 
PIN ) YWR 

PIN 6 YNOSWND 
PIN 7 YBGT 

PIN 8 XLRDYSRC 
PIN 9 RFO 


PIN 10 SNPDIS 
PIN 11 PALLC 
PIN 13 KLOCK 
PIN 23 PXSAS 


PIN 16 YMLOCK 
PIN 17s SV2 
PIN 18 HBASWB > 
PIN 13 MBREQ 
PIN 20 MSNPSTB 
PIN 21 svl 
PIN 22 ~ YALLOC 


EQUATIONS 
/YMLOCK.D := /MRESET * /YALLOC * YMLOCK 
+ /MRESET * /HBASWB * YMLOCK 
+ /MRESET * YBGT * /YALLOC * /HBASWB 
+ /MRESET * /KLOCK * /YALLOC * /HBASWB 
+ /MRESET * /YBGT * /KLOCK * YMLOCK 
YMLOCK.CLKF = MCLK 
YMLOCK.RSTF = GND 
YMLOCK.SETF = GND 
YMLOCK.TRST = VCC 


/SV2.D := PXSAS * /SV2 
+ PXSAS * /MHLDA * /HBASWB 
SV2.CLKF = MCLK 
SV2.RSTF = GND 
SV2.SETF = GND 
SV2.TRST = VCC 


/HBASWB.D := /MRESET * PXSAS * /HBASWB * SV2 
, + /MRESET * PXSAS * /YBGT * MBREQ * SV2 
HBASWB . CLKF 


= MCLK 
HBASWB.RSTF = GND 
HBASWB.SETF = GND 


HBASWB.TRST = VCC 


/MBREQ.D := PXSAS * /MBREQ 
+ PXSAS * HBASWB * /SV2 
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+ /PXSAS * /YBGT * MBREQ * HBASWB * SV2 
+ MRESET . | | 

MBREQ.CLKF = MCLK 

MBREQ.RSTF = GND 

MBREQ.SETF = GND 

MBREQ.TRST = VCC 


/MSNPSTB.D := /MRESET * /YBGT * YNOSWND * YWR * MSNPSTB 
+ /MRESET * /YBGT * YNOSWND * XLRDYSRC * MSNPSTB 
+ /MRESET * /YBGT * YNOSWND * RFO * MSNPSTB 
MSNPSTB.CLKF = MCLK on | 
MSNPSTB.RSTF = GND 
MSNPSTB.SETF = GND 
MSNPSTB.TRST = VCC 


/SV1.D := /YALLOC 


SV1.CLKF 
SV1.RSTF 
SV1.SETF 
SV1.TRST 


MCLK 
GND 
GND 
Vcc 


Hooou a 


/YALLOC.D := /MRESET * PXSAS * /YALLOC * /SV1 

| + /MRESET * /MKEN * /YALLOC * SV1 

= + /MRESET * /YBGT * /PALLC * SNPDIS * /RFO * YALLOC 
YALLOC.CLKF = MCLK | 7 
YALLOC.RSTF = GND 

YALLOC.SETF = GND 


pervs é TRST Vcc 240957 -86 
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PATTERN A 
REVISION 3.1 3 

AUTHOR ISIC SILAS + Andy Bloom 
COMPANY INTEL 

DATE 2/7/91 


CHIP x01 85C224 
This PLD contains the YMBRDY state machine. 


wrt te ee ee ee ee eee eee ee eee Pin Declarations ---------------- 


PIN 1 MCLK 

PIN 2 MRESET 
PIN 3 MAOE 

PIN 4 MHLDA 
PIN 5 YNOPIPE 
PIN 6 YPIPE 
PIN 7 MCACHE 
PIN 8 YMEOC 
PIN g MEMZBTEN 


PIN 10 SYNC | 
PIN 11 MALDRV 

PIN 13 FLUSH 

PIN 14 NCPFLD 

PIN 15  FPFLDEN 

PIN 23 NC4 


PIN 16 NC1 
PIN 17 NC2 
PIN 18 NC3 
PIN 19 YMZBT 
PIN 20 . FPFLD 
PIN 21 YFLUSH 
PIN 22 YSYNC 


EQUATIONS 


NCl = VCC 
NC1.TRST = VCC 


NC2 = VCC 4 
NC2.TRST = VCC 


NC3 = VCC 
NC3.TRST = VCC 


/YMZBT.D := /MRESET * YPIPE * YMEOC * /YMZBT 

/MRESET * /MCACHE * YMEOC * /YMZBT 

/MRESET * YNOPIPE * /YPIPE * /MCACHE * /MEMZBTEN 
/MRESET * YNOPIPE * /YPIPE * /MCACHE * /YMZBT 
/MRESET * MHLDA * /MAOE * /MEMZBTEN * YMZBT 
/MRESET * /YNOPIPE * /MCACHE * /MEMZBTEN * YMZBT 
MRESET 7 
MCLK 

GND 

GND 

VCC 


+ 


t++etest 


YMZBT . CLKF 
YMZBT .RSTF 
YMZBT .SETF 
YMZBT . TRST 
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/FPFLD = FPFLDEN * MRESET 
FPFLD.TRST = MRESET | 


/YFLUSH = MRESET * /NCPFLD 


+ /MRESET * /FLUSH 
YFLUSH.TRST = VCC 


/YSYNC = MRESET * /MALDRV 
+ /MRESET * /SYNC 


YSYNC.TRST = VCC 
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ESIGGEN 
PATTERN | 
REVISION 1.0 
AUTHOR 
COMPANY INTEL 
DATE 


CHIP INTEL 85C224 


we we wow we 
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PIN 1 YDRCTM 
PIN 2 YMADS 
PIN 3 YMAOE 
PIN 4 MHLDA 
PIN 5 NC1 

PIN 6 MWBWT 
PIN 7 MDRCTM 
PIN 8 SNPDIS 
PIN 9 UNI 

PIN 10 YMSEL 
PIN 11 TR4 
PIN 13 YMFRZ 
PIN 14 MDLDRV 
PIN 23 LMRST 
PIN 15 C8MSEL 
PIN 16 NC2 . 
PIN 17 CDRCTM 
PIN 18  CWBWT 
PIN 19 MBOFF 
PIN 20 MADS 
PIN 21 YSNPDIS 
PIN 22 —C8MFRZ 
EQUATIONS 


/C8MSEL = LMRST * /TR4 
+ /LMRST * /YMSEL 
C8MSEL.TRST = VCC 


NC2 = VCC 
NC2.TRST = VCC 


CDRCTM = MDRCTM * YDRCTM 
CDRCTM.TRST = VCC 


/CWBWT = /MWBWT * YDRCTM 
CWBWT.TRST = VCC 


/MBOFF = YMAOE * /MHLDA 
MBOFF.TRST = /MHLDA 


/MADS = /YMADS * /YMAOE 
MADS.TRST = /YMAOE 


YSNPDIS = SNPDIS * UNI 
YSNPDIS.TRST = VCC 


/C8MFRZ = LMRST * /MDLDRV 
+ /LMRST * /YMFRZ 
C8MFRZ.TRST = VCC 


AP-452 
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This PLD drives memory bus and core signals based on the states 
of other state machines 


240957-89 


intel. AP-452 PRELIMINARY 


TITLE ESWND 

PATTERN A 
REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 

DATE 2/4/91 


CHIP x01 85C0224 


; This PLD contains the XCRDY, XSWND, and XENSWND state machines. 


i CLK 
PIN 2 RESET 
PIN 3 WSDTS 
PIN 4 BGT 
PIN 5 PBGT 
PIN 6 TR4 
PIN 7 YSMSWND 
PIN 8 SNPDIS 
PIN 9 YSMEOC 
PIN 10 SLFTST 
PIN 13 OEx 
PIN 16 ENSWND 
PIN 17 Sv3 
PIN 18 SWEND 
PIN 19 ~—s sv2 
PIN 20 svl 
PIN 21 CRDY 
PIN 22 CRDY1 
EQUATIONS. 


ENSWND.D := /RESET * /YSMSWND 
ENSWND.CLKF = CLK 
ENSWND.RSTF = GND 
ENSWND.SETF = GND 
/ENSWND.TRST = OEx 


/SV3.D := RESET * TR4 
+ /RESET * CRDY * SWEND * /SV3 
+ /RESET * /PBGT * /ENSWND * /YSMSWND * CRDY * SV3 
+ /RESET * /PBGT * CRDY * /SNPDIS * /SWEND * SV3 
+ /RESET * /PBGT * /ENSWND * /YSMSWND * SWEND * SV3_ 
SV3.CLKF = CLK 
SV3.RSTF = GND 
-$V3.SETF = GND 
/SV3.TRST = OEx 


RESET * TR4 
/RESET * /CRDY * SWEND * /SV3 
/RESET. * PBGT * CRDY * /SWEND * SV3 
* 
* 


/SWEND.D : 


/BCT * /SNPDIS * SWEND * SV3 
ENSWND * CRDY * SNPDIS * /SWEND * SV3 

/RESET * YSMSWND * CRDY * SNPDIS * /SWEND * SV3 

/RESET * /BGT * /ENSWND * /YSMSWND * SWEND * SV3 

+ /RESET * /PBGT * /ENSWND * /YSMSWND * /CRDY * /SWEND * SV3 

SWEND.CLKF = CLK 
SWEND.RSTF = GND 
SWEND.SETF = 


t++etttsta 
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/SWEND.TRST = OEx 


/SV2.D := RESET * /SLFTST 
+ /RESET * /YSMEOC * CRDY * /SV2 
+ /RESET * /YSMEOC * /CRDY * SV2 
SV2.CLKF = CLK 
SV2.RSTF = GND 
SV2.SETF = GND 
/SV2.TRST = OEx 


/SV1.D := /RESET * /YSMEOC * CRDY 
+ /RESET * CRDY * /SV1 * SV2 
SV1.CLKF = CLK 
SV1L.RSTF = GND 
SV1.SETF = GND 
/SV1.TRST = OEx 


/CRDY.D := RESET * /SLFTST 
+ /RESET * /YSMEOC * /WSDTS * CRDY * SV2 
+ /RESET * /WSDTS * CRDY * /SV1 * SV2 
CRDY.CLKF = CLK — 
CRDY.RSTF = GND 
CRDY.SETF = GND 
/CRDY.TRST = OEx 


/CRDY1.D := RESET * /SLFTST 
+ /RESET * /YSMEOC * /WSDTS * CRDY1 * SV2 
+ /RESET * /WSDTS * CRDY1 * /SV1 * SV2 
CRDY1.CLKF = CLK 
_ CRDY1.RSTF = GND 
CRDY1.SETF = GND 


/CRDY1.TRST = OEx 
240957-91 
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TITLE EWCPLB 
PATTERN A 

REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 

DATE 2/4/91 


CHIP x01 85C224 
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- This PLD contains the XWCPLB and YCPULEN state machines. 


~s- eee wrennranawe 


ewrnewen een we = ww wm we @ oo ae eee wrerewneweo ee we e@ @ @§ we ww oo 


PIN 1 CLK 
PIN 2. RESET 
PIN 3 CRDY 
PIN 4 — RDYSRC 
PIN 5 BGT 
PIN 6  PBGT 
PIN 7 KCACHE 
PIN. 8 LEN 
PIN. 9 CACHE 
PIN 10  CKEN 
PIN 11  BRDY 
PIN 13  #OEx 
PIN 16 CLEN4 
PIN 17. - .CLEN2 
PIN 18  CLEN1 
PIN 19  LKCACHE 3 
PIN 20 ° SV 
PIN 21  CPUEN 
PIN 22 WCPLB 
EQUATIONS 
/CLEN4.D := CPUEN * /CLEN2 * /CLEN4G 
+ /BRDY * /CLEN1 
+ BRDY * CLEN2 * /CLEN4 ed 
+ /CACHE * /CKEN * * /LKCACHE * /CLEN2 * /CLENG 
+ RESET 
CLENG.CLKF = CLK 
CLEN4.RSTF = GND 
CLEN4.SETF = GND 


/CLEN4.TRST = 


OEXx 


/CLEN2.D := CPUEN * /CLEN2 * /CLEN4 
+ BRDY * /CLEN2 * CLEN4 
+ /BRDY * CLEN2 * CLEN4 
+ LEN * CACHE * /CLEN2 * /CLEN4 
+ LEN * CKEN * /CLEN2 * /CLEN4 
+ LEN * LKCACHE * /CLEN2 * /CLEN4G 
| + RESET 
CLEN2.CLKF = CLK 
CLEN2.RSTF = GND 
CLEN2.SETF = GND 
/CLEN2.TRST = OEx 
/CLEN1.D := /RESET * BRDY * /CLEN1 
+ /RESET * /BRDY * /CLEN2 * CLEN4G 
+ /RESET * /CPUEN.* /LEN * CACHE * /CLEN2 * /CLENG 
+ /RESET * /CPUEN * /LEN * CKEN * /CLEN2 * /CLENG 
+ /RESET * /CPUEN * /LEN * LKCACHE * /CLEN2 * /CLEN4 
CLEN1.CLKF = CLK | : 
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CLEN1.RSTF = GND 
CLEN1.SETF = GND 
/CLEN1L.TRST = OEx 


/LKCACHE.D := /KCACHE 
LKCACHE.CLKF = CLK 
LKCACHE.RSTF = GND 
LKCACHE.SETF = GND 
/LKCACHE.TRST = OEx 


/SV.D := /RESET * CRDY * /SV 
+ /RESET * /RDYSRC * /BGT * CPUEN * SV 
+ /RESET * CRDY * /BRDY * /CLEN1 * WCPLB * /CPUEN 
SV.CLKF = CLK 
SV.RSTF = GND 
SV.SETF = GND 
/SV.TRST = OEx 


/CPUEN.D := /RESET * BRDY * /CPUEN 
* CLEN1 * /CPUEN 
: RDYSRC * /BGT * /WCPLB 
RDYSRC * /BGT * CPUEN * SV 
CPUEN.CLKF = 
CPUEN .RSTF 
CPUEN.SETF = 
/CPUEN.TRST = OEx 


/WCPLB.D := /RESET * BRDY * /WCPLB 
* CLEN1 * /WCPLB 

/CRDY * BRDY * /CPUEN 

/CRDY * CLEN1 * /CPUEN 


WCPLB. CLKF 
WCPLB.RSTF 
WCPLB.SETF = GND 


/WCPLB.TRST = OEx 
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TITLE EWMSWND 
PATTERN A 

REVISION 2.0 

AUTHOR ISIC SILAS 
COMPANY INTEL 

DATE 2/4/91 


CHIP x01 85C224 


ewe we w 


PIN 1 MCLK 
PIN 2 MRESET 
PIN 3 XSAS 
PIN 4 XSNPWB 
PIN ) YPIPE 
PIN 6 YNOPIPE 
PIN : YMEOC 
PIN 8 MHITMI 
PIN 9 YMSWEND 


PIN 10 YNOSWND 
PIN 11 YBGT 
PIN 13 ' OEx 

PIN 14 YALLOC 
PIN 23 PCTCXFR 


PIN 7k PXSAS 
PIN ae YSWEHITM 
EQUATIONS | 


UNUSED = VCC 
UNUSED. TRST = VCC 


AP-452 | PR ELIMINARY 


This PLD contains the YENMSWND, YWMSWND, and YENXSAS state machines. 


/PSWBAS = /XSAS * /XSNPWB * /ENXSAS 


PSWBAS.TRST = VCC 


/SV.D := YMEOC * /WMSWND * /SV 


+ /YNOSWND * /YPIPE * YMEOC * /WMSWND 
+ /YPIPE * /YMSWEND * / ENMSWND * YMEOC * /WMSWND 


SV.CLKF = MCLK 
SV.RSTF = GND 
SV.SETF = GND 
/SV.TRST = OEx 


ll 


/WMSWND.D := /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 
+ /MRESET 

WMSWND.CLKF = MCLK 

WMSWND.RSTF = GND 


/WMSWND * 


% % % % 


/YNOSWND * 


% % % 


~ 


YMEOC * /WASWIND 


/SV 


/YNOSWND * /YPIPE * /WMSWND 


/YBGT * WMSWND 


/YPIPE * /YMSWEND * /ENMSWND * /WMSWND 

YPIPE * /PCTCXFR * /YALLOC * /WMSWND 

/YMSWEND * /ENMSWND * /YALLOC * WMSWND | 

YNOSWND * /YNOPIPE * /YMSWEND * /ENMSWND * WMSWND 
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WMSWND.SETF = GND 
/WMSWND.TRST = OEx 


/ENMSWND.D := YMSWEND 
ENMSWND.CLKF MCLK 
ENMSWND.RSTF GND 
ENMSWND.SETF = GND 
/ENMSWND.TRST = OEx 


/ENXSAS.D := YBGT * /ENXSAS 
+ XSAS * ENXSAS 
+ MRESET 
ENXSAS.CLKF = MCLK 
ENXSAS.RSTF = GND 
ENXSAS.SETF = GND 
/ENXSAS.TRST = OEx. 


/PXSAS = /XSAS * XSNPWB * /ENXSAS 
PXSAS.TRST = VCC. 


/YSWEHITM = /YMSWEND * /ENMSWND * /MHITMI * YALLOC 


+ /YMSWEND * /ENMSWND * /MHITMI * YNOPIPE * YPIPE 
YSWEHITM.TRST = VCC 
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PRELIMINARY 


BEDDED 32-BIT PROCESSORS 
iT BURST DATA BUS | 


Built-In Interrupt Controller 
— 4 Direct Interrupt Pins | 
— 32 Priority Levels 256 Vectors 


Built-in Floating Point Unit 
(80960SB only) | 
— Fully IEEE 754 Compatible 


Easy to Use, High Bandwidth 16-Bit Bus 


-— 25.6 Mbyte/sec Burst 


—Up to 16 Bytes Transferred per Burst 
32-Bit Address Space, 4 Gigabytes 
80-Lead Quad Flat Pack (EIAJ QFP) 


80960SA/80960SB 
WITH 16-8 
High-Performance Embedded oO 
Architecture 
— 16 MIPS Burst Execution at 16 MHz 
— 5 MIPS* Sustained Execution at 
16 MHz 
512-Byte On-Chip Instruction Cache 
— Direct Mapped Fa 
— Parallel Load/Decode for Uncached 
Instructions 
Multiple Register Sets mT 
— Sixteen Global 32-Bit Registers | 
— Sixteen Local 32-Bit Registers Cl 
— Four Local Register Sets Stored 


On-Chip 
— Register Scoreboarding 


© Software Compatible with 
80960IKKA/KB/CA Processors 


84-Lead Plastic Leaded Chip Carrier 
(PLCC) . 


The 80960SA and 80960SB are members of Intel’s i960 32-bit processor family, which are designed especially 
for low cost embedded applications. They are based on the family’s high performance, common core architec- 
ture, and include a 512-byte instruction cache and a built-in interrupt controller. The BO960SA and 80960SB 
have a large register set, multiple parallel execution units and a high bandwidth, 16-bit, burst bus. Using 
advanced RISC technology, these high performance processors are capable of execution rates in excess of 
5 million instructions per second.* The 80960SA and 80960SB are well-suited for a wide range of cost 
sensitive embedded applications such as laser printers, EISA and MCA Boalt? disk controllers and X 


Terminals. 


*Relative to Digital Equipment Corporation’s VAX-11/780** at MIPS 


80960SB 


Four 80=bit 
{ Floating Point 


Registers 


16 32=bit 
Global — 
Registers 


Local 
80—bit 


Floating 


Instruction | 
Fetch Unit 


Decoder 


**VAX-11T™ is a trademark of Digital Equipment Corporation. 
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512=Byte =eL 
Instruction k Instruction 
it 


32=bit 
Integer 
Execution 


sicro=inatruation 


Bus 
Control 
Logic 
Interrupt 

Controller 


32=—bit Address 
16—bit Data 
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Sequencer 
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THE i960™ PROCESSOR SERIES 


80960SA/80960SB 


The B0960SA and 80960SB are members of a new _ 


family of 32-bit microprocessors from Intel known as 
the i960 Series. This series was especially designed 
to serve the needs of embedded applications. The 
embedded market includes applications as diverse 
as industrial automation, avionics, image processing, 
graphics, robotics, telecommunications and automo- 
biles. These types of applications require high inte- 
gration, low power consumption, quick interrupt re- 
sponse times and high performance. Since time to 
market is critical, embedded microprocessors need 
to be easy to use in both hardware and software 
designs... | - , 


gor 
SIXTEEN 
32-BiT | 
REGISTERS 


FOUR 80-BIT REGISTERS 


GLOBAL 

REGISTERS(1, 4) 
gi5 
fpO 


fp3 
.r0 
| SIXTEEN 

- 32-BIT 

| REGISTERS 


LOCAL 
REGISTERS(3) 


ARITHMETIC CONTROLS 


INSTRUCTION POINTER - 


PROCESS CONTROLS 


TRACE CONTROLS 


NOTES: ~° 


1. Register g15 is reserved for stack management functions. 
2. Floating-Point registers and operations are available only in the 960SB and 960KB processors. 
3. Registers rO, r1 and r2 are reserved for stack management functions. . 


4. Register g14 is used by BAL and BALX instructions. 


Figure 2. 80960 Register Set 
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All members of the 80960 series share a common 
core architecture which utilizes RISC technology so 
that, except for special functions, the family mem- 
bers are object code compatible. Each new proces- 
sor in the series will add its own special set of func- 
tions to the core to satisfy the needs of a specific 
application or range of applications for the embed- 
ded market. For example, future processors may in- - 
clude a DMA controller, a timer or an A/D converter. 


Software written for the 80960SA and 80960SB will 
run without modification on any other member of the 
80960 family. The 80960SA is pin compatible with . 
the 80960SB, which includes an integrated floating- 
point unit. 


FLOATING- 
POINT 
REGISTERS(2) 


ADDRESS 
SPACE 
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Key Performance Features 


The 80960SA and 80960SB’s architecture is based 
on the most recent advances in RISC technology 
and is grounded in Intel’s long experience in design- 
ing embedded controllers. Many features contribute 
to the 80960SA and 80960SB exceptional perform- 
ance. 


1. Large Register Set. Modern compilers can take 
advantage of a large number of registers to optimize 
execution speed. For maximum flexibility, the 
80960SA and 80960SB provide 32 32-bit registers 
and four 80-bit floating-point registers. (See 
Figure 2.) . 


2. Fast Instruction Execution. Simple functions 
make up the bulk of instructions in most programs, 
so that execution speed can be greatly improved by 
ensuring that these core instructions execute in as 
short a time as possible. The most-frequently exe- 
cuted instructions such as register-register moves, 
add/subtract, logical operations, and shifts execute 


in one to two cycles (Table 1 contains a list of in-: 


structions). 


3. Load/Store Architecture. Like other processors 
based on RISC technology, the 80960SA and 


Control | Displacement 
Compare . | - . ; | 
and Branch Reg/ Et eg | we Displacement... . 
Register | 


80960SA/80960SB 
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80960SB has a Load/Store architecture. Only the 
LOAD and STORE instructions reference memory; 
all other instructions operate on registers. This type 
of architecture simplifies instruction decoding and is 
used in combination with other techniques to. in- 
crease parallelism. 


4. Simple Instruction Formats. All instructions in 
the 80960SA and 80960SB are 32 bits long and 
must be aligned on word boundaries. This alignment 
makes it possible to eliminate the instruction-align- 
ment stage in the pipeline. To simplify the instruction 
decoder further, there are only five instruction for- 
mats and each instruction type uses only one for- - 
mat. (See Figure 3.) 


5. Overlapped Instruction Execution. A load oper- 
ation allows execution of subsequent instructions to 
continue before the data has been returned from 
memory, so that these instructions can overlap the 
load. The 80960SA and 80960SB manage this pro- 


cess transparently to software through the use ofa i 


register scoreboard. Conditional instructions also 
make use of a scoreboard so that subsequent unre- 
lated instructions can be executed while the condi- 
tional instruction is pending. 


Memory | 
Memory 


Displacement 
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Figure 3. Instruction Formats 
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- Table 1. 80960SA and 80960SB Instruction Set 


Load = Add And ‘ Set Bit | 
Store | Subtract Not And © |. Clear Bit 
Move Multiply And Not — Not Bit 
Load Address. Divide ;  Or- Check Bit 
os , Remainder. _ Exclusive Or Alter Bit 
Modulo.—y | Not Or Scan for Bit 
Shift . OrNot. Scan over Bit 
Extended Multiply . Nor | ; | Extract 
Extended Divide Exclusive Nor Modify 
, | — {| Not | 
Nand 
Rotate | 


Compare. a _ Unconditional Branch Call ~ Conditional Fault 
_ Conditional Compare. al - Conditional Branch | Call Extended oa Synchronize Faults 
Compareand =. =| Compare and Branch Call System | | | 
+ Increment Giigkee| 2 Bee og | Return 
Compare and : a ee - Branch and Link 
Decrement we tee Mie | 


Debug | Miscellaneous , _. Decimal. 


Modify Trace Controls _ Atomic Add Move. 
Mark : Atomic Modify - Add with Carry . 
Force Mark _ | Flush Local Registers Subtract with Carry 
| po Modify Arithmetic — ia 
Controls 
Scan Byte for Equal 
Test Condition Code 


Conversion — - Floating-Point : eae onn | 
(80960SB only) (80960SB only) y . | 


Convert Real to Integer ~ Move Real . te Synchronous Load 
Convert Integer to Real Add | SS Synchronous Move 
| _ Subtract | 

Multiply 

~ Divide ; 
Remainder. 

~ Scale | 
Round 
Square Root 
Sine 
Cosine 
‘Tangent 
Arctangent 
Log 
Log Binary 
Log Natural 
Exponent 
Classify 
Copy Real Extended 
Compare 
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6. Integer Execution Optimization. When the re- | 


sult of an operation is used as an operand in a sub- 


’ sequent calculation, the value is sent immediately to 
its destination register. Yet at the same time, the 
value is put back on a bypass path to the ALU, 
thereby saving the time that otherwise would be re- 
quired to retrieve the value for the next operation. 


7. Bandwidth Optimizations. The 80960SA and 
80960SB get optimal use of their memory bus band- 
width because the bus is tuned for use with the 
cache; the line size of the instruction cache matches 
the maximum burst size for instruction fetches. The 
80960SA and 80960SB automatically fetch four 
words in a burst and store them directly in the 
cache. Due to the size of the cache and the fact that 
it is continually filled in anticipation of needed in- 
structions in the program flow, the 80960SA and 
80960SB are exceptionally insensitive to memory 
wait states. In fact, each wait state causes only a 
10% degradation in system performance. The bene- 
fit is that the 80960SA and 80960SB will deliver out- 
standing performance even with a low cost memory 
system. 


8. Cache Bypass. If there is a cache miss, the proc- 
essor fetches the needed instruction, then sends it 
on to the instruction decoder at the same time it 
updates the cache. Thus, no extra time is taken to 
load and read the cache. 


Memory Space and Addressing Modes 


The 80960SA and 80960SB offer a linear program- 
ming environment so that all programs running on 
the processors are contained in a single address 
space. The maximum size of the address space is 
4 Gigabytes. 


For ease of use, the 80960SA and 80960SB have a 
small number of addressing modes, but include all 
those necessary to ensure efficient compiler imple- 
mentations of high-level languages such as C, For- 
tran and Ada. Table 2 lists the memely addressing 
modes. 


PRELIMINARY 


Data Types 


The 80960SA and 8096088 recognize the following 
data types: 


Numeric: 

© 8-, 16-, 32- and 64- bit ordinals 
e 8-, 16-, 32- and 64- ‘bit integers 
° 8.-, 16-, 32-, 64- and. 80- bit reals 


Non-Numeric: 

@ bit 

©: bit Field 

e ‘Triple-Word (96 bits) 
° Quad- Word (128 bits) 


cael Register Set 


The following environment of the 80960SA and —>— 


80960SB include a large number of registers. In fact, 
32 registers.are available at any time. The availability 
of this many registers greatly reduces the number of 
memory accesses required to execute most pro- 
grams, which leads to greater instruction processing 
speed. - 


There are two types of general-purpose registers: 
local and global. The global registers consist of six- 
teen 32-bit registers (GO through G15). These regis- 
ters perform the same function as the general-pur- 


pose registers provided in other popular microproc- 


essors. The term global refers to the fact that these 
registers retain their contents across procedure 
calls. | 


The local registers, on the other hand, are proce- 


- dure specific. For each procedure call, the 80960SA 
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and 80960SB allocate 16 local registers (RO meu? 
R15). Each local register is 32 bits. wide. 


Multiple Register Sets 


To further increase the efficiency of the register set, 


-multiple sets of local-registers are stored on-chip. 


This cache holds up to four local register frames, 
which means that up to three procedure calls can be 
made without having to access the procedure stack 
resident in memory. 


80960SA/80960SB 
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, Table 2. Memory Addressing Modes 


12-Bit Offset. 

32-Bit Offset 

Register-Indirect 

Register + 12-Bit Offset 
_ Register + 32-Bit Offset 


- Scale-Factor is 1, 2, 4, 8o0r16 


Although programs may have procedure calls nest- 
ed many calls deep, a program typically oscillates 


back and forth between only two or three levels. As. 


a result, with four stack frames in the cache, the 
probability of there being a free frame on the cache 
when a call is made is very high. In fact, runs of 
representative C-language programs show that 80% 


~~ of the calls are handled without needing to access 
memory. 4 | 


If there are four or more active procedures and a 
new procedure is called, the igrcadel moves the 


Register 
- One of Four - - 


oe Local 
_ Register Sets 


Register + (Index-Register x Scale-Factor) 
Register x Scale Factor + 32-Bit Displacement | 
Register + (Index-Register x Scale-Factor) + 32-Bit Displacement 


oldest set of local registers in the register cache to.a 
procedure stack in memory to make room for a new 
set of registers. Global register G15 is used by the 
processor as the frame pointer (FP) for the PIOGe: 
dure stack. 


| Note that the global registers are not exohenaed on 


a procedure call, but retain their contents, making 
them available to all procedures for fast parameter 
passing. An illustration. of the register cache is 
shown in Figure 4. 


Local Register Set 
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Figure 4. Multiple Register Sets are Stored On-Chip 
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Instruction Cache 


To further reduce memory accesses, the 80960SA 
and 80960SB include a 512-byte on-chip instruction 
cache. The instruction cache is based on the con- 
cept of locality of reference; that is, most programs 
are not usually executed in a steady stream but con- 
sist of many branches and loops that lead to jumping 
back and forth within the same small section of 
code. Thus, by maintaining a block of instructions in 
a cache, the number of memory references required 
to read instructions into the processor can be greatly 
reduced. 


To load the instruction cache, instructions are 
fetched in 16-byte blocks, so that up to four instruc- 
tions can be fetched at one time. 


Code for small loops will often fit entirely within the 
cache, leading to a great increase in processing 
speed since further memory references might not be 
necessary until the program exits the loop. Similarly, 
when calling short procedures, the code for the call- 
ing procedure is likely to remain in the cache, so it 
will be there on the procedure’s return. | 


Register Scoreboarding 


The instruction decoder has been optimized in sev- 
eral ways. One of these optimizations is the ability to 
do instruction overlapping by means: of register 
scoreboarding. | 


Register scoreboarding occurs when a LOAD in- — 


struction is executed to move a variable from memo- 
ry into a register. When the instruction is initiated, a 

scoreboard bit on the target register is set. When the 
- register is actually loaded, the bit is reset. In be- 
tween, any reference to the register contents is ac- 
companied by a test of the scoreboard bit to insure 
that the load has completed before processing con- 
tinues. Since the processor does not have to wait for 
the LOAD to be completed, it can go on to execute 
additional instructions placed in between the LOAD 
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instruction and the instruction that uses the register 
contents, as shown in the following example: 


LOAD address 1, R4 
LOAD address 2, R5 
Unrelated instruction 
Unrelated instruction 
ADD R4, R5, R6 : 


In essence, the two unrelated instructions between. 
the LOAD and ADD instructions are executed for 
free (i.e., take no apparent time to execute) because 
they are executed while the register is being loaded, 
Up to three instructions can be pending at one time 
with three corresponding scoreboard bits set. By ex- 
ploiting this feature, system programmers and com- 
pilers have a useful tool for optimizing | execution 
speed. 


Floating-Point Arithmetic 


In the 8096088, floating-point arithmetic has been jaa am 


made an integral part of the architecture. Having the 
floating-point unit integrated ‘on-chip provides two 
advantages. First, it improves the Performance of 
the chip for floating- point applications, since no ad- 
ditional bus overhead is associated with floating- 
point calculations, thereby leaving more time for oth- 
er bus operations such as I/O. Second, the cost of 
using floating-point operations is reduced because a 
separate coprocessor chip is not required. 


The 80960SB floating-point (real number) data types 


include single-precision (32-bit), double-precision 
(64-bit) and extended precision (80-bit) floating-point 
numbers, Any register may be used to execute float- 
ing-point operations. 


The processor provides hardware support for both 
mandatory and recommended portions of IEEE 
Standard 754 for floating-point arithmetic, exponen- 
tial, logarithmic and other transcendental functions. 
Table 3 shows execution times for some representa- 
tive instructions. 


intel. 


Table 3. Sample Floating-Point 
Execution Times (js) at 16 MHz 


aiepalas 


Square Root 


pctenasnt 
| 


High Bandwidth Bus 


The 80960SA and 80960SB CPUs reside on a high- 
bandwidth address/data bus. The bus provides a di- 
rect communication path between the. processor 
and the memory and I/O subsystem interfaces. The 
processor uses the bus to fetch instructions, manip- 


ulate memory and respond to nterUetS: Its features 


include: 
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e 16-bit data path multiplexed onto the lower bits of 
the 32-bit address path 


e Eight 16-bit half-word burst capacity, which al- 
lows transfers from 1 to-16 bytes at a time . 


e High bandwidth reads and writes at 25.6 Mbytes 
' per second 


Figure 5 identifies the groups of signals which con- 
stitute the Bus. Table 4 lists the function of the Bus 
and other processor-support signals, such as the i in- 
terrupt lines. 


inteenipt Handling 


The 80960SA and 80960SB can be interrupted in 
one of two ways: by the activation of one of four 
interrupt pins or by sending a Meereee on the proc- 
essor’s data bus. 


The 80960SA and 80960SB are unusual in that they 

automatically handle interrupts on a priority basis 
and track pending interrupts through their on-chip 
interrupt controller. Two of the interrupt pins can be 
configured to provide 8259A handshaking for expan- 


~ gion beyond four interrupt lines. 


. 960SA/SB Bus . 


960SA/SB Bus Signal Groups 


Address (32-Lines) / Data (16-Lines > os 


270917-5 


Figure 5. 80960SA and 80960SB Bus Signal Groups 
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Debug Features 


The 80960SA and 80960SB have built-in debug ca- 
pabilities. There are two types of breakpoints and six 
different trace modes. The debug features are con- 
trolled by two internal 32-bit registers, the Process- 
Controls Word and the Trace-Controls Word. By set- 
ting bits in these control words, a software debug 
monitor can closely contro! how the processor re- 
sponds during program execution. 


The 80960SA and 80960SB have both hardware 
and software breakpoints. They provide two hard- 
ware breakpoint registers on-chip which can be set 
by a special command to any value. When the in- 
struction pointer matches the value in one of the 
breakpoint registers, the breakpoint will fire, and a 
breakpoint handling routine is called automatically. 


Tracing is available for all instructions (single-step 
execution), calls and returns and branching. Each 
different type of trace may be enabled separately by 
a special debug instruction. In each case, the 
80960SA and 80960SB execute the instruction first 
and then call a trace handling routine (usually part of 
~a software debug monitor). Further program execu- 
tion is halted until the trace routine is completed. 
When the trace event handling routine is completed, 
instruction execution resumes at the next instruc- 
tion. The 80960SA and 80960SB’s tracing mecha- 
nisms, which are implemented completely in hard- 
ware, greatly simplify the task of es and debug- 
ging software. 


FAULT DETECTION 


The 80960SA and 80960SB have an automatic 
mechanism to handle faults. There are ten fault 
types including trace, arithmetic, and floating-point 


faults. When the processor detects a fault, it auto-— 


matically calls the appropriate fault handling routine 
and saves the current instruction pointer and neces- 


sary state information to make efficient recovery. 


possible. The processor posts diagnostic informa- 
tion on the type of fault to a Fault Record. Like inter- 
rupt handling routines, fault handing routines are 
usually written to meet the needs of a specific 


application and are often included as part of the op- 
erating system or kernel. 


For each of the ten fault types, there are numerous 
subtypes that provide specific information about a 
fault. For example, a floating-point fault may have its 
subtype set to an Overflow or Zero-Divide fault. The 
fault handler can use this specific information to re- 
spond correctly to the fault. 


BUILT-IN TESTABILITY 


Upon reset, the 80960SA and 80960SB. automatical- 
ly conducts an extensive internal test (self-test) of its 
major blocks of logic. Then, before executing its first 
instruction, it does a zero check sum on the first 
eight words in memory to ensure that the system 
has been loaded correctly. If a problem is discov- 
ered at any point during the self-test, the 80960SA 
and 80960SB will indicate a failure and will not begin 
program execution. The self-test takes approximate- 
ly 47,000 cycles to complete, and can be disabled. 


System manufacturers can use the 80960SA and § 


_ 80960SB’s self-test feature during incoming parts in- 
_ spection. No special diagnostic programs need to be 


written, and the test is both thorough and fast. The 
self-test capability helps ensure that defective parts 
will be discovered before systems are shipped, and 
once in the field, the self-test makes it easier to dis- 
tinguish between problems caused by processor fail- 
ure and problems resulting from other causes. 


CHMOS 


The 80960SA and 80960SB are fabricated using In- 
tel’s CHMOS IV (Complementary High Speed Metal 
Oxide Semiconductor) process. This advanced tech- 
nology eliminates the frequency and reliability limita- 
tions of older CMOS processes and opens a new 
era in microprocessor performance. It combines the 
high performance capabilities of Intel’s industry- 
leading HMOS technology with the high density and 
low power characteristics of CMOS. The 80960SA 
and 80960SB are available at 10 MHz in both PLCC 
and QFP packages, and at 16 MHz in the PLCC 


package. 


3-9 


intel. | - siaevany sugnise PRELIMINARY 


Table 4. 80960SA and 80960SB Pin Description: Bus Signals 


| Name and Function | 


SYSTEM CLOCK provides the fundamental timing for 80960SA and 80960SB 
systems. CLK2 is divided by two inside the 80960SA and 80960SB to generate the 
internal processor clock. 


ADDRESS BUS carries the upper 16 bits of the 32-bit address to memory. It is valid 
throughout the burst cycle, no latch is required. 


ADDRESS/DATA BUS carries the low order 32-bit addresses and 16-bit data to and 
from memory. AD15-—AD4 must be latched since the cycle following the address 
cycle carries data on the bus. - 


ADDRESS BUS carries the word addresses of the 32-bit address to memory. These 
three bits are incremented during a burst access indicating the next word address of 
the burst access. Note that A3—A1 are duplicated with AD3-AD1 during the address 
cycle. | 


Symbol 


CLK2 — 
A31-A16 ie 
AD15-AD1,D0| 1/0. 
| TS. 


Type 


ADDRESS LATCH ENABLE indicates the transfer of a physical address. ALE is 
asserted during a Ta cycle and deasserted before the beginning of the following Td 

state. It is active high and floats to a high impedance state nd a hold eyele (Th or 
Thr). 


_| ADDRESS STATUS indicates an address ; state. AS is asserted every Ta state and 
deasserted during the following Td state. AS is driven HIGH during reset. 


WRITE/READ specifies, during a Ta cycle, whether the operation is write or ead. It 
is latched on-chip and remains valid during Td cycles. 


DATA ENABLE i is asserted during Td cycles and indicates transfer of data on the AD 
lines. The AD lines should not be driven by an external source unless DEN is 

asserted. When DEN is asserted, the outputs from the previous cycle are guaranteed 
to be 3-stated. In addition, DEN deasserted indicates inputs have been captured and 
therefore input hold times can be disregarded. DEN is driven to a HIGH during reset. | 


READY indicates that data on AD lines can be sampled or removed. If READY is not 
asserted during a Td cycle the Td ere is extended to the next cycle by inserting a 
wait state (Tw). | 


DATA TRANSMIT/RECEIVE indicates the direction of the data transfer to and from 
the bus. It is low during Ta and Td cycles for a read or interrupt acknowledgement; it 
is high during Ta and Td cycles for a write. DT/R never changes state when DEN is — 
asserted. DT/R is driven HIGH during reset. 


BURST LAST indicates the last data cycle (Td) of a burst access. It is asserted low 
during the last Td and associated Tw cycles in a burst access. 


INITIALIZATION FAILURE indicates that the processor has failed to initialize 
correctly. The failure state is indicated by a combination of BLAST asserted and both 
BE signals not asserted. This condition occurs after RESET is deasserted and before 
the first bus transaction begins. FAIL is asserted while the processor performs a self- 
test. If the self-test completes successfully, then FAIL is deasserted. Next, the | 
processor performs a zero checksum on the first eight words of memory. If it fails, 
FAIL is asserted for a second time and remains asserted; if it passes, system 
initialization continues and FAIL remains deasserted. 


I/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = 3-State. 


BLAST/FAIL 
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Table 4. 80960SA and 80960SB Pin Description: Bus Signals (Continued) 


RESET clears the internal logic of the processor and causes it to reinitialize. 


During RESET assertion, the input pins are ignored (except for INTO, INT1, INT3, 
LOCK), the tri-state output pins are placed in a HIGH impedance state (except for 
DT/R, DEN, and AS), and other output pins are placed in their non-asserted state. 
RESET must be asserted for at least 41 CLK2 cycles for a predictable reset. 
Optionally, for a synchronous reset, the LOW to HIGH transition of RESET should 
occur after the rising edge of both CLK2 and the external bus clock, and before the 
next rising edge of CLK2. 

The interrupt pins indicate the initializtion sequence executed. Typical initialization 
requires driving only INTO and INTS3 to a HIGH state. The reset conditions follow: 


INTO INT1 INT3 LOCK Action Taken 


1 Run self-test (core initialization) 
0 0 1 1 Disable self-test 
0 1 X X Reserved 
x x 0 x Reserved — 
4 Xx : ONCE mode (see LOCK pin) 


BYTE ENABLE LINES specify which data bytes (up to two) on the bus take part in 
the current bus cycle. BE1 corresponds to AD15~AD8 and BEO corresponds to 
AD7-AD1, DO. The byte enable lines are asserted appropriately during each data ~ 
cycle. 


INITIALIZATION FAILURE indicates that the processor has failed to initialize 

~ correctly. The failure state is indicated by a combination of BLAST asserted and 
both BE signals not asserted. This condition occurs after RESET is deasserted and 
before the first bus transaction begins. FAIL is asserted while the processor 
performs a self-test. If the self-test completes successfully, then FAIL is 
deasserted. Next, the processor performs a zero checksum on the first eight words 
of memory. If it fails, FAIL is asserted for a second time and remains asserted; if it 

passes, system initialization continues and FAIL remains deasserted. 


INTERRUPT 0 indicates a pending interrupt. The bus interrupt control register 
determines in which way the signal should be interpreted. To signal an interrupt 
request in a synchronous system, this pin (as well as the other interrupt pins) must 
be enabled by being deasserted for at least one bus cycle and then asserted for at 
least one additional bus cycle; in an asynchronous system, the pin must remain. 
deasserted for at least two bus cycles and then be asserted for at least two more 
bus cycles. INTO is sampled during RESET to determine if the self-test sequence is 
to be executed. 


INTERRUPT 1 indicates a direct interrupt, ike INTO. INT1 is sampled during : 
RESET to determine if the self-test sequence is to be executed. 


INTERRUPT 2/INTERRUPT REQUEST: The interrupt control register determines 
how this pin is interpreted. If INT2, it has the same interpretation as the INTO and 
INT1 pins. If INTR, it is used to receive an interrupt eaves from an external 8259A 
compatible interrupt controller. 


INTERRUPT 3/INTERRUPT ACKNOWLEDGE: The iene control register 
determines how this pin is interpreted. If INT3, it has the same interpretation as the | 
INTO and INT1 pins. If INTA, it is used as an output to control interrupt- 


INT2/INTR =| 
acknowledge bus transactions. The INTA output is latched on-chip and remains 
valid during Td cycles. INT3 must be pulled to a HIGH state during RESET. 


1/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = 3-State. 


INT1 
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| Table 4. 80960SA and 80960SB Pin Description: Bus Signals (Continued) 

els | BUS LOCK prevents other bus masters from gaining control of the bus following the 
current cycle (if they would assert LOCK to do so). LOCK is used by the processor or: 

any bus agent when it performs indivisible Read/Modify/Write (RMW) operations. Do 
not leave LOCK unconnected. It must be pulled HIGH for the processor to function 
properly. 
For a read that is designated as an RMW- read, LOCK is examined. If asserted, the 


~ processor waits until it is not asserted; if not asserted, the pioess=0! asserts LOCK 
-during the Ta cycle and leaves it asserted. 


A write that is designated as an RMW-write deasserts LOCK i in the Ta cycle. During 
the time LOCK is asserted, a bus agent can perform a normal read or write but no 
RMW operations. LOCK is also held asserted during an interrupt-acknowledge 
transaction. 


Name and Function 


ONCE MODE: The LOCK pin is sampled during reset. If it is asserted LOW at the end ~ 
of RESET, all outputs will be 3-stated until the part is reset. ONCE MODE is used in 
conjunction with an ICE. 


HOLD: HOLD indicates a request from a secondary bus master to acquire the bus. 
When the processor receives HOLD and grants another master control of the bus, it 

_ floats its tri-state bus lines and then asserts HLDA and enters the Th state. When 
HOLD is deasserted, the processor will deassert HLDA and go to either the TiorTa 
state. | 


HOLD ACKNOWLEDGE: HLDA See that bus control has been relinquished to 
- another bus master. This signal is always driven. At RESET it is driven LOW. 


NOT CONNECTED indicates pins should not be connected. Never connect any pin 
- marked N.C. 3 


1/0 = Input/Output, O= Output, | = Input, O. D. = sve Drain, TS. = 3- sla 


‘lanes Likewise, all Vss pins dioiiid be strapped to- 
gether, preferably on. a ground plane. These pins 
may not be connected together within the chip. 


ELECTRICAL SPECIFICATIONS 


Power and Grounding, 


-The 80960SA and 80960SB are ‘aagienienied: in 
_CHMOS IV technology and have modest power re- 
quirements. Their high clock frequency and numer- 
ous output buffers (address/data, control, error, and 


_ Power Decoupling Recommendations 


Liberal decoupling capacitance should be placed 
~ near the 80960SA and 80960SB. The processor can 


arbitration signals) can cause power surges as multi- 


ple output buffers drive new signal levels simulta- _ 


neously. For clean on-chip power distribution at high 
frequency, 12 Vcc and 13 Vss pins separately feed 


functional units of ine 80960SA and 80960SB in the 


package. 


Power and ground connections must be made to all 
power and ground pins of the 80960SA and 


80960SB. On the circuit board, all Voc pins must be | 


cause transient power surges when driving the bus, 
particularly when it is connected to a large capaci- 
tive load. 


_ Low inductance seein and interconnects are 


recommended for best high frequency electrical per- 
formance. Inductance can be reduced by shortening 


'' the board traces between the processor and decou- 


strapped closely together, preferably on a power. 
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Connection Recommendations 


For reliable operation, always connect unused in- 
puts to an appropriate signal level. In particular, if 
one or more interrupt lines are not used, they should 
be deasserted. No inputs should ever be left float- 


ing. 


All open-drain outputs require a pullup device. While 
in some cases a simple pullup resistor will be ade- 
quate, we recommend a network of pullup and pull- 
down resistors biased to a valid Vijy (2=2.0V) and 
terminated in the characteristic impedance of the cir- 
cuit board. Figure 6 shows our recommendations for 
the resistor values for both a low and high current 
drive network, which assumes that the circuit board 
has a characteristic impedance of 10029. The advan- 
tage of terminating the output signals in this fashion 
is that it limits signal swing and reduces AC power 
consumption. 


Characteristic Curves 


The 80960SA and 80960SB characteristic curves 


shown in Figures 7 through 10 supply information 
‘regarding typical supply currents, typical current ver- 
sus frequency, worst case voltage versus output cur- 
rent on open drain pins and capacitive derating 
curves. 


Figure 7 shows the typical supply current require- 
ments over the operating temperture range of the 


OPEN-DRAIN 
OUTPUT 


270917-6 


Low Drive Network: 
@ Von = 2.45V to 3.0V 
° lo. = 9.5 mA to 12 mA 
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processor at supply voltage (Vcc) of 5V. Figure 8 
shows the typical power supply current (Icc) re- 
quired by the 80960SA and 80960SB at various op- 
erating frequencies when measured at three input 
voltage (Vcc) levels. 


For a given output current (Io), the curve in Figure 9 
shows the worst case output low voltage (Vo). Fig- 
ure 10 shows the typical capacitive derating curve 
for the 80960SA and 80960SB measured from 1.5V 
on the system clock (CLK) to 0.8V on the falling 
edge and 2.0V on the rising edge of the bus ad- 
dress/data (AD) signals. 


Test Load Circuit 


Figure 11 illustrates load circuit used to test the 
80960SA and 80960SB’s 3-state pins, and Figure 12 
shows the load circuit used to test the open drain 
output. The open drain test uses an active load cir- 


cuit in the form of a matched diode bridge. Since the § 


open-drain output sinks current, only the Io, legs of 


the bridge are necessary and the lox legs are not a : 
used. When the 80960SA and 80960SB driver under ™ 


test is turned off, the output pin is pulled up to VagF 
(i.e., Vox). Diode D1 is turned off and the 'OL current 
source flows through diode D2. 


When the B0960SA and 80960SB open-drain driver 
under test is on, diode D1 is also on, and the voltage 
on the pin being tested drops to Vo,. Diode D2 turns 
off and lo, flows through diode D1. 


OPEN-DRAIN 
OUTPUT 


270917-7 


High Drive Network: 
© VoH = 2.48V to 3.0V 
* lo. = 16 mA to 20 mA 


Figure 6. Open Drain Connection Recommendations for 
Low and High Current Drive Networks for the LOCK Pin 
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5.0V  5.5Y 


~ TYPICAL SUPPLY CURRENT (mA) 


SUPPLY VOLTAGE (V) - 
-270917-8 | 


OPERATING FREQUENCY (MHz) 


M@4.5V O@5.0V. @@5.5V_ 


Figure 7. Typical Supply Current: 
vs Supply Voltage | | 
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- Capacitive Load (pF) 


Output Low Current (mA) 


270917-10 270917-11 


Figure 9. Worst Case Voltage vs 
Output Current on Open-Drain Pin 


3-STATE OUTPUT 


OPEN-DRAIN OUTPUT 


lot Tested at 12 and 20 mA 


270917-12 -- — Vrer = Voc 
D; and Do are matched 
270917-13 
Figure 11. Test Load Circuit for Figure 12. Test Load Circuit for 


3-State Output Pins | Open-Drain Output Pins 
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ABSOLUTE MAXIMUM RATINGS NOTICE: This data sheet contains preliminary infor- 
mation on new products in production. The specifica- 
Operating Temperature tions are subject to change without notice. Verify with 
(PEGG) x23 aeiek tae ease 3 O°C to + 100°C Case your local Intel Sales office that you have the latest 
Operating Temperature data sheet before finalizing a design. 
(OEP) ieiceadaneeekhon reese 0°C to + 100°C Case * WARNING: Stressing the device beyond the “‘Absolute . 
Storage Temperature .......... —65°C to + 150°C Maximum Ratings” may cause permanent damage. 
te These are stress ratings only. Operation beyond the 
Voltage on Any Pin (PLCC)... —0.5V to Vcc + 0.5V “Operating Conditions” is not recommended and ex- 


Voltage on Any Pin (QFP). . —0.25V to Vcc + 0.25V tended exposure beyond the “Operating Conditions”’ 
Power Dissipation ................. 1.9W (16 MHz) AV ANG CUGMICR TeHanlty: 


DC CHARACTERISTICS 
960SA/SB (10 MHz and 16 MHz): Tcase = 0°C to + 100°C, Vcc = 5V +10% unless otherwise noted. 


[vn [input High Votage | 20 | Voo +08 
. 
L 
H 


Output Low Voltage 0.45 
0.45 
0.60 


Output High Voltage [wd 


| Min 
VIL 
Vin 2.0 
Vo ; 
Vo i 
Power Supply Current: = | 
10 MHz—QFP 
10 MHz—PLCC 
16 MHz—PLCC 
Penal 
bos 
ell 
a 


lo. = 2.5mA 
lo. = 12 mA, LOCK Pin 
lo. = 20 mA, LOCK Pin 


All TS, -—2:5mA@4) 


Tcase = 0°C() 
Tcase = 0°C 
Tcase = 0°C 


=a ae 


Note 
r 


Clock Capacitance Lae. _ | fe = 1 MHz(3) 
NOTES: 7 : | 


1. Tcase is specified at 0°C to + 100°C for the QFP at 10 MHz and Vcc = 5V + 5%. 
2. INTO has an internal pullup that sources 100 pA. a 

3. Input, output and clock capacitance are not tested. 

4. Not measured on open-drain outputs. 

5. Lock has an internal pullup that sources 100 LA. . 
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the signal crosses (for output delay and input setup) 
AC SPECIFICATIONS | 1.5V. All AC testing should be done with input volt-. 
This section describes the AC specifications for the | ages of 0.4V and 2.4V, except for the clock (CLK2), 
80960SA and 80960SB pins. All input and output which should be tested with input voltages of 0.45V 
timings are specified relative to the 1.5V level of the and 0.7 * Vcc. See Figure 13 for timing relationships 
rising edge of CLK2, and refer to the time at which — for the 80960SA and 80960SB signals. 


CLK2 


. OUTPUTS: 
AD(1:15), A(1:3), DO 
A(16:31), BE(0:1) 
DEN, BLAST 


, W/R R 
HLDA, LOCK, INTA. 
AL 


— 1.5V VALID OUTPUT. 1.5V —-5 


INPUTS: 
AD(1:15), DO 
INTO, INT1 
INT2/INTR, INT3 


HOLD 
LOCK 
READY 


- 970917-14 


Figure 13. Drive Levels and Timing Relationships of 80960SA and 80960SB Signals 
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AC Specification Tables 
| 80960SA and 80960SB AC Characteristics (10 MHz) | 


Symbol Test Conditions 


Processor Clock 50 Vin = 1.5V 
Period (CLK2) : 


125 


(e) 


Vr = 10% Point 
= Vor + (VcH — VeL) X 0.1 


Vr = 90% Point | 
= VoL + (VcH — VeL) X 0.9 


Vr = 90% Point to 10% Point(3) 


Processor Clock Low 
Time (CLK2) 
Processor Clock High 
Time (CLK2) 
Processor Clock Fall 
Time (CLK2) 
Processor Clock Rise 
Time (CLK2) 
| TE Output Valid Delay 2 
T6AS AS Output Valid Delay 


ALE Width 24 
ALE Output Valid Delay 


Output Float Delay. 


2 
1 
) 


© 


Vr = 10% Point to 90% Point(3) 


q 
: 


CL = 100 pF (AD) 
_ C, = 100 pF (Controls)() 


nN 
Oo 
= 
” 


£ 


— 
© 


ee 


NOTES: ) a 
1. A float condition occurs when the maximum output current becomes less than ILO. Float delay is not tested, but should 
be no longer than the valid delay. 

2. Meeting RESET setup and hold times is an optional method of synchronizing your clocks. If you decide to use an asyn- 
chronous reset, then synchronizing the clock can be accomplished by using AS. 

3. Processor clock (CLK2) rise time and fall time are not tested. 

4. ICE requires a minimum of 4 ns input hold time. 


— — 
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80960SA and 80960SB AC Characteristics (16 MHz PLCC) 


[Symbot [Parameter | Min | Max | unite | TestGonditons | 
lesa pepe [ese 
Period (CLK2) 

Ne + (VcH — VoL) X 0.1 


Time (CLK2) 
Processor Clock High 
Time (CLK2) 
- Processor Clock Fall 
Time (CLK2) 


Vy = 90% Point 
= VoL + (VcH — Vez) X 0.9. 


Vr = 90% Point to 10% Point(3) 


Vr = 1 Point to 90% Point(3) 


dae 
[oe Doeso 
ae 
Tree = 1005 
= fom fs = = 100 pF (AD) | | 
Lied 
ees 
ons | 
pons 
ons’ | 
ie 2 
| ns | 
pins | 


Tese= 

Time on 

[7s [ouputvaiabeay | 2 | 2 

‘Fras [as onpuvasseny [2 [2 

eee 

i 
0 
pee eel 
13 
| 10 
on 
Lo 
== 


Output Float Delay | 


[ie | mpasoue 1 
rs Sema sa 


CL = 100 pF (Controls)(1) 


(Note 4) 


sea Set meegi 
Kea 
NOTES: 


_ 1. A float condition occurs when the maximum output current becomes less than ILO. Float delay is not tested, but should 
be no longer than the valid delay. 

2. Meeting RESET setup and hold times is an optional method of synchronizing your clocks. If you decide to use an asyn- ; 
chronous reset, then synchronizing the clock can be accomplished by using AS. 
3..Processor clock (CLK2) rise time and fall time are not tested. 

4. ICE requires a minimum of 4 ns input hold time. 
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A(4:15)/D(0:15) 
A(1:3) 
BE(0:1) 


A(16:31) 


ALE - 


270917-15 


NOTES: } | or 
1. The AD and control signals are driven at all times except during a HOLD acknowledge (HLDA asserted) RESET, and 
ONCE mode. 3 7. ae 

2. The AD and control signals may toggle during idle (Ti) or recovery (Tr) cycles. 


Figure 14. Timing Relationships of the 80960SA and 80960SB Bus 
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OUTPUTS 


_INTO, INT1 
INT3, LOCK 


_ Initialization Parameters 
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4. The A edge is defined as the first rising CLK2 B edba: after RESET is deasserted meeting the RESET hold and setup 


times. 


_ 2. Initialization Parameters must be setup at least four CLK2s prior to the first A edge. 


_ Figure 15. RESET Signal Timing 


Figure 16. HOLD Timing Relationships 


Design Considerations 


Input hold times can be disregarded by the designer 
whenever the input is removed because a subse- 


quent output from the processor is deasserted (e. Qe. 


DEN becomes deasserted). 


Whenever the processor generates an output that 
indicates a transition into a subsequent state, any 
outputs that are specified to be 3-stated in this new 
state are guaranteed to be 3-stated. For example, in 


the Td cycle following a Ta cycle for a read, the | 


minimum output delay of DEN is 2 ns, but the max- 
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imum float time of AD is 20 ns. When DEN is assert- 
ed, however, the AD outputs are guaranteed to have 
been 3-stated. . 


Designing for the ICE-960SB 


The 80960SA and 80960SB In-Circuit Emulator as- 
sists in debugging 8O960SA and 80960SB hardware 
and software designs. The product consists of a 
probe module, cable, control unit and power supply. 
Because of the high operating frequency of the 
80960SA and 80960SB systems, the probe module 
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connects directly to the 80960SA and 80960SB . 
component (EIAJ QFP or PLCC) or a socket for the 
PLCC. | 


When designing an 80960SA and 80960SB hard- 
ware system that uses the ICE-960SB to debug the 
system, several electrical and mechanical character- 
istics should be considered. These considerations 


include capacitive loading, drive requirement, power 


requirement, and physical layout. 


The ICE-960SB probe module increases the load 
capacitance of each line by up to 25 pF. This load 
originates from the probe module and are driven by 
the 80960SA and 80960SB processor. 


To achieve high noise immunity, the ICE-960SB 
probe is powered by the user’s system. The high- 
speed probe circuitry draws up to 1.1A plus the max- 
imum current (Icc) of the 80960SA and 80960SB 
processor. 


The AD bus should not be driven by an external 
source unless DEN is asserted. In addition, the ICE 
requires a minimum data hold time of 4 ns. 


The ICE960SB probe will drive LOCK to a LOW 
state during RESET to force the target 80960SA and 
80960SB to enter ONCE mode. To guarantee tim- 
ings, the ICE requires +5% supply voltage supplied 
to the 80960SA and 80960SB. The ICE probe re- 
quires a minimum of 0.25 inches clearance on all 
sides of both the EIAJ QFP and PLCC. | 


Lock Line Termination 
You must terminate the LOCK line as described in 


Figure 6 in order for the ICE to properly function. 


MECHANICAL DATA 


Package Dimensions and Mounting 


The 80960SA and 80960SB is available in two differ- 
ent packages: an 80-lead quad flat pack (EIAJ QFP), 
shown in Figure 17, and an 84-lead plastic leaded 
chip carrier (PLCC), shown in Figure 18. 


Pin Assignment 


The QFP and PLCC have different pin assignments. 
The QPF pins are numbered in order from 1 to 80 
around the package’s perimeter. The PLCC pins are 
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numbered in order from 1 to 84 around the pack- 
age’s perimeter. Tables 9 and 10 list the function of 
each pin in the QFP. Tables 11 and 12 list the func- 
tion of each pin in the PLCC. ~ | 


~ Veco and GND connection must be made to multiple 


Vcc and GND pins. Each Vcc and GND pin must be 
connected to the appropriate voltage or ground and 
externally strapped close to the package. We rec- 
ommend that you include separate power and 
ground planes in your circuit board for power distri- 
bution. 


NOTE: 
Pins identified as N.C., “No Connect,” should never 
be connected. The 80960SA and 80960SB QFP 
package contains two N.C. pins and PLCC package 


contains six N.C. pins. | 


Package Thermal Specification 


The 80960SA and 80960SB is specified for opera- 


tion when case temperature is within the range 0°C h 


to + 85°C. The case temperature should be mea- ( 


sured at the top center of the package. 


The ambient temperture can be calculated from 0jc 
and @ ja by using the following equations: | 
Ty = To + P* Ojo 
Ta = Ty — P* 85a 
To = Ta + P* [0ja-8ycl] 


Values for Oya and @jc are given in Table 7 for the 
QFP package and in Table 8 for the PLCC package 


for various airflows. 


Example: 
Ta= To P*(Osa- Osc) 
Tc = Maximum Case Temperature 


P = Maximum Supply Voltage times Icc¢ 
at 100° and 10 MHz 


63, and 0j¢ = QFP Package Thermal Resistance 
at 0 ft/m airflow 


Ta = 51 = 100 — (6.5 * 0.213) * (45.7 — 4) 


WAVEFORMS 


Figure 19 through 22 shows the waveforms for vari-. 
ous signals on the 80960SA and 80960SB’s bus. 
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Table 7. QFP Package, Thermal Resistance—C/Watt 


[Parameter 
0 ja Junction to Ambient 
(Case measured in the middle 


| 0 | s0 | 100 | 200 | 400 | 600 | 800 | 
of the top of the package) 


«6B 
(No Heatsink) | 


[“@cuunctiontocase | 40 | na | na | 45 | 55 | na | na | 


NOTES: 

_ 1. This table applies to an 80960SA and 80960SB QFP soldered directly onto a board. 
2. Oya = 93c + Oca. . . 

3. Thermal data are based on copper lead frames. 


‘ 


Airflow—ft/min 


9.5 


| 0 | 50 | 200 | 400 | 600 | 800 | 
_ 834 Junction to Ambient 33 27 23.8 22 20 
NOTES: | 


Table 8. PLCC Package, Thermal Resistance—°C/Watt 
| 6 jc Junction to Case 
1. This table applies to an 80960SA and 80960SB PLCC soldered directly onto a board. 


| | Airflow—ft/min 
700 
2. Oya = 93C + OCA. | 


Intel 
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: Figure 18. 84-Lead Plastic Leaded Chip Carrier 
‘Figure 17. 80-Lead EIAJ Quad Flat Pack Package | 
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Table 9. 80960SA and 80960SB QFP Pinout—In Pin Order 


QO 
O 


ii 
N 
on 


Table 10. 80960SA an 


a. 


80960SB QFP Pinout—In Signal Order 


< 
@) 
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Table 11. 80960SA and 80960SB PLCC Pinout—!In Pin Order 


= 
— 


INT2/INTR 


INT3/INTA 


: 
oD 
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Table 12. 80960SA and 80960SB PLCC Pinout—In Signal Order 


@) 
©) 
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Figure 20. 80960SA and 80960SB Timing 
Showing a Four Word Aligned Read Burst 


3-26 


intel. 80960SA/80960SB PRELIMINARY 


off ffafnf 


A(4:15)/D(0:15) Bios 
A(1:3) a Se aa ane a (WA | 


A(16:31) —— sees ce = a 


us Saas 
BLAST TA | et 4 
EEOC ee oa 


ofa za ee eee es a ae 


' a an 


270917-22 


Figure 21. 80960SA and 80960SB Double Word Read Timing with Wait States 
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Figure 22. 80960SA and 80960SB Aligned Double Word Write Timing with Wait States 
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| Figure 24. 80960SA and 80960SB Timing with a Three Word Write Burst 
Misallgned by One Byte and One war State 
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iIS60™ KA/KB PROCESSOR 
PRODUCT OVERVIEW 


INTRODUCTION 


This chapter provides an overview of the Intel 1960 KB 
processor (which is part of the 1960 K series of embed- 
ded-processor products). 


All of the processors in the i960 K series of products 
are based on the Intel 1960T™ architecture. Most of the 
information in this overview also applies to the 1960 
KA processor. The only difference between the i960 
KB and 1960 KA processors is that the 1960 KA proc- 
essor does not provide on-chip support for floating- 
point operations or operations on decimal numbers. 


OVERVIEW OF THE i960™ KB 
ARCHITECTURE 


The 1960 KB processor introduced the i960 architec- 
ture—a new 32-bit architecture from Intel. This archi- 
tecture has.been designed to meet the needs of embed- 
‘ ded applications such as machine control, robotics, 
_ process control, avionics and instrumentation. 


The i960 architecture can best be characterized as a 


high-performance computing engine. It features high- | 


speed instruction execution and ease of programming. 
It is also easily extensible, allowing processors and con- 
trollers based on this architecture to be conveniently 
customized to meet the needs of specific processing and 
control applications. 


The following are some of the important attributes of 
- the i960 architecture: 


© full 32-bit registers 
°@ high-speed, pipelined instruction execution 


© a convenient program execution environment with 
32 general-purpose registers and a versatile set of 
special-function registers 


© a highly optimized procedure call mechanism that 
features on-chip caching of local varies and pa- 
rameters 


© extensive facilities for handling interrupts and faults 


© extensive tracing facilities to support efficient pro- 
gram debugging and monitoring 


© register scoreboarding and write buffering to permit 
efficient operation when used with lower perform- 
ance memory subsystems 


OVERVIEW OF THE SINGLE 
PROCESSOR SYSTEM 
ARCHITECTURE 


The central processing module, memory module and 
I/O module form the natural boundaries for the hard- 
ware system architecture. The modules are connected 
together by the high bandwidth 32-bit multiplexed 
L-bus, which can transfer data at a maximum sustained 
rate of 53 Mbytes per second for an 1960 processor op- 


erating at 20 MHz. 


Figure 1 shows a simplified block diagram of one possi- 
ble system configuration. The heart of this system is the 
1960 KB processor, which fetches instructions, executes 
code, manipulates stored information and interacts 
with I/O devices. The high bandwidth L-bus connects 
the. 1960 KB processor to memory and I/O modules. 
The 1960 KB processor stores system data, instructions 

and programs in the memory module. By accessing var- ~ 
ious peripheral devices in the I/O module, the i960 KB 
processor supports communication to terminals, mo- 


‘dems, printers, disks and other I/O devices. 


i960™ KB Processor and the L-Bus| 


The i960 KB processor performs bus operations using 
multiplexed address and data signals, and provides all 
the necessary control signals. For example standard 
control signals, such as Address Latch Enable (ALE), 
Address/Data Status (ADS), Write/Read Command 
(W/R), Data Transmit/Receive (DT/R) and Data En- 
able (DEN), are provided by the i960 KB processor. 
The i960 processor also generates byte enable signals 
that specify which bytes on the 32-bit data lines are 
valid for the transfer. 


The L-bus supports burst transactions, which access up 
to four data words at a maximum rate of one word per 
clock cycle. The 1960 KB processor uses the two low- 
order address lines to indicate how many words are to 
be transferred. The i960 KB processor performs burst 
transactions to load the on-chip 512-byte instruction 
cache to minimize. memory accesses for instruction 
fetches. Burst transactions can also be used for data 
access. , 


To transfer control of the bus to an external bus master, 
the 1960 KB provides two arbitration signals: hold re- 
quest (HOLD) and hold acknowledge (HLDA). After 
receiving HOLD, the processor grants control of the 
bus to an external master by asserting HLDA. 


Order Number: 272030-001 


i960™ KB 
PROCESSOR 


The i960 KB processor provides a flexible interrupt 
structure by using an on-chip interrupt controller, an 
external interrupt controller or both. The type of inter- 
rupt structure is specified by an internal interrupt vec- 
tor register. For a system with multiple processors, 
another method is available, called inter-agent commu- 
nication (IAC) where a processor can interrupt another 
processor by sending an IAC message. 


Memory Module 


A memory module can consist of a memory controller, 
Erasable Programmable Read Only Memory 
(EPROM), and static or dynamic Random Access 
Memory (RMA). The memory controller first condi- 
tions the L-bus signals for memory operation. It demul- 
tiplexes the address and data lines, generates the chip 
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Figure 1. Basic i960™ KB System Configuration 


i960™ KA/KB PROCESSOR PRODUCT OVERVIEW 


MEMORY 
CONTROLLER 
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can be designed to accommodate the burst transaction 
of the i960 KB processor by using the static column 
mode or nibble mode features of the dynamic RAM. In 
addition to supplying the operational signals, the con- 
troller generates the READY signal to indicate that 
data can be transferred to or from the i960 KB proces- 
sor. | 


The i960 KB processor directly addresses up to 
_ 4 Gbytes of physical memory. The processor does not 


select signals from the address, detects the start of the 


cycle for burst mode operation and latches the byte 
enable signals. 


The memory controller generates the control signals for 
EPROM, SRAM and DRAM. Specifically, it provides 
the control signals, multiplexed row/column address 
and refresh control for dynamic RAMs. The controller 


allow burst accesses to cross a 16-byte boundary, to 
ease the design of the controller. Each address specifies 
a four-byte data word within the block. Individual data 
bytes can be accessed by using the four byte-enable sig- 
nals from the i960 KB processor. Chapter 5 provides 
design guidelines for the memory controller. 


1/O Module 


The I/O module consists of the I/O eomponcnts and 
the interface circuit. I/O components can be used to 
allow the 1960 KB processor to use most of its clock | 
cycles for computational and system management ac- 
tivities. Time consuming tasks can be off-loaded to spe- 
cialized slave-type components, such as the 8259A Pro- 
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grammable Interrupt Controller or the 82530 Serial 
Communication Controller. Some tasks may require a 
master-type component, such as the 82586 Local Area 
Network Control. 


The interface circuit performs several functions. It de- 


multiplexes the address and data lines, generates the ~ 


chip select signals from the address, produces the I/O 
read or I/O write command from the processor’s W/R 
signal, latches the byte enable signals and generates the 
READY signals. Since some of these functions are 
identical to those of the memory controller, the same 
logic can be used for both interfaces. For master-type 
peripherals that operate on a 16-bit data bus, the inter- 
face circuit translates the 32-bit data bus to a 16-bit 
data bus. 


The i960 KB processor uses memory-mapped addresses 
to access I/O devices. This allows the CPU to use many 
of the same instuctions to exchange information for 
both memory and peripheral devices. Thus, the power- 
ful memory-type instructions can be used to perform 8-, 
16- and 32-bit data transfers. 


HIGH PERFORMANCE PROGRAM 
‘EXECUTION 


Much of the design of the 1960 architecture has been 
aimed at maximizing the processor’s computational 
and data processing speed through the use of increased 
parallelism. The following paragraphs describe several 
of the mechanisms and techniques used to accomplish 
this goal. 


Load and Store WNiodel 


One of the more important features of the i960 archi- 
tecture is its performance of most operations on oper- 
ands in registers, rather than in memory. For example, 
all arithmetic, logic, comparison, branching and bit op- 
erations are performed with registers and literals. 


This feature provides two benefits. First, it increases 
program execution speed by minimizing the number of 
memory accesses necessary to execute a program. Sec- 
ond, it reduces the memory latency encountered when 
using slower, lower-cost memory parts. 


i960T™ IKA/iKKB PROCESSOR PRODUCT OVERVIEW 


The architecture also provides a set of fast, versatile 
load and store instructions. These instructions allow 
burst transfers of 1, 2, 4, 8, 12 or 16 bytes of informa- 
tion between memory and the registers. 


On-Chip Caching of Code and Data 


To further reduce memory accesses, the architecture 
offers two mechanisms for caching code and data on 
chip: an instruction cache and multiple sets of local 
registers. The instruction cache allows prefetching of 
blocks of instruction from memory. This helps ensure 


that the instruction execution pipeline is supplied with 


To support this concept, the architecture provides a. 


generous supply of general-purpose registers. For each 
procedure, 32 registers are available, 28 of which are 
available for general use. These registers are divided 
into two types: global and local. Both types of registers 
can be used for general storage of operands. The only 
difference is that global registers retain their contents 
across procedure boundaries, whereas the processor al- 
locates a new set of local registers each time a new 
procedure is called. . 
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a steady stream of instructions. It also reduces the 
number of memory accesses required when performing 
iterative operations such as loops. The architecture al- 
lows the size of the instruction cache to vary. For the 
1960 KB processor, it is 512 bytes. 


To optimize the architecture’s procedure call mehan- 
ism, the processor provides multiple sets of local regis- 
ters. This allows the processor to perform procedure 
calls without having to write the local registers out to 
the stack in memory. The number of register sets de- 
pends on the processor implementation. The i960 KB 
processor provides four sets of local registers. 


Overlapped Instruction Execution 


The i960 architecture also enchances program execu- 
tion speed by overlapping the execution of some in- 
structions. In the i960 K series of processors, this is 
accomplished through register scoreboarding. 


Register scoreboarding permits instruction execution to 
continue while data is being fetched from memory. 
When a load instruction is executed, the processor sets 
one or more scoreboard bits to indicate the target -regis- 
ters to be loaded. After the target registers are loaded, 
the scoreboard bits are cleared. While the target regis- 
ters are being loaded, the processor is allowed to exe- 
cute other instructions that do not use these registers. 


The processor uses the scoreboard bits to ensure that 
the target registers are not used until the load is com- 


plete. (Scoreboard bits are checked transparently from 


software.) This technique allows code to be executed 
such that some instructions can be executed in zero 
clock cycles (that is, executed for free). 


Single-Clock Instructions 


The 1960 architecture is designed to let a processor exe- 
cute commonly used instructions, such as moves, adds, 
subtracts, logical operations and branches, in a mini- 
mum number of clock cycles (preferably one cycle). 
The architecture supports this concept in several 


intel. 


ways. For example, the load and store model described 
earlier eliminates the clock cycles required to perform 
memory-to-memory operations, by concentrating on 
register-to-register operations. 


In addition, all of the instructions in the i960 architec- 
ture are 32 bits long and aligned on 32-bit boundaries. 
This lets instructions be decoded in one clock cycle, 
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registers for the procedure being returned to are re- 
stored. This means a program never has to explicitly 


save and restore those local variables that are stored in 


local registers. 


Versatile Instruction Set and 


and eliminates the need for an instruction-alignment. 


stage in the pipeline. 


The i960 KB processor takes full advantage of these 
features of the architecture, resulting in more than 50 
instructions that can be executed in a single clock cycle. 


Efficient Interrupt Model 


The i960 architecture provides an efficient mechanism 
for servicing interrupts from external sources. To han- 
dle interrupts, the processor maintains an interrupt ta- 
ble of 248 interrupt vectors, 240 of which are available 
for general use. When an interrupt 1s signaled, the proc- 
essor uses a pointer to the interrupt table to perform an 
implicit call to an interrupt handler procedure. In per- 
forming this call, the processor automatically saves the 
state of the processor prior to receiving the interrupt, 
performs the interrupt routine, then restores the state of 
the processor. A separate interrupt stack is also provid- 
ed to segregate interrupt handling from application 
programs. : 


The interrupt handling facilities also allow interrupts to 
be evaluated by priority. The processor is then able to 
store interrupt vectors that are lower in priority than 


Addressing 


The selection of instructions and addressing modes also 
simplifies programming. A full set of load, store, move, 
arithmetic, comparison and branch instructions are 
provided, with operations on both integer and ordinal 
data types. Operations on bits and bit strings are simpli- 
fied by a complete set of Boolean and bit-field instruc- 
tions. 


The addressing modes are efficient and straightforward, 


while at the same time providing the necessary indexing 


and scaling modes required to address complex arrays 
and record structures. The large 4-gigabyte address 
space provides ample room to store programs and data. 
The availability of 32 addressing lines allows some ad- 


‘dress lines to be memory-mapped to control hardware 


functions. 


Extensive Fault Handling Capability 


To aid in program development, the 1960 architecture 


the current processor task in a pending interrupt sec-_ 


tion of the interrupt table. The processor checks and 
services the pending interrupts at defined times. 


SIMPLIFIED PROGRAMMING 
ENVIRONMENT 


Because of its streamlined execution environment, 
processors based on the i960 architecture are particu- 
larly easy to program. The following paragraphs de- 
scribe some of the architecture features that simplify 
programming. 


Highly Efficient Procedure Call 
Mechanism 


The procedure call mechanism makes procedure calls 
and parameter passing between procedures simple and 
compact. Each time a call instruction is issued, the 
processor automatically saves the current set of local 
registers and allocates a new set for the called proce- 
dure. Likewise, on a return from a procedure, the cur- 
rent set of local registers is deallocated and the local 
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defines a wide range of faults that the processor detects, 
including, arithmetic, faults, invalid operations, invalid 
operands and machine faults. When a fault is detected, 
the processor makes an implicit call to a fault handler 
routine, in a way similar to the interrupt mechanism 
described previously. The information collected for 
each fault allows program developers to quickly correct 
faulting code, and allows automatic recovery from 
some faults. 


Debugging and Monitoring 


To support debugging systems, the 1960 architecture 
provides a mechanism for monitoring processor activity 
by means of trace events. When the processor detects a 
trace event, it signals a trace fault and calls a fault han- 
dler. Intel provides several tools that use this feature, 
including an in-circuit emulator (ICE) device. 


SUPPORT FOR ARCHITECTURAL 
EXENSIONS 


The i960 architecture provides several features that en- 
able processors based on this architecture to be easily 
customized to meet the needs of specific embedded ap- 
plications, such as signal processing, array processing 
or graphics processing. 


intel. 


The most important of these features is the set of 32 
special function registers. These regisers provide a con- 
venient interface to circuitry in the processor or pins 
that can be connected to external hardware. They can 
be used to control timers, to perform operations on spe- 
cial data types or to perform I/O functions. The special 
function registers are similar to the global registers. 
They can be addressed by all of the register access in- 
structions. 


EXTENSIONS INCLUDED IN THE 
i960™ K SERIES PROCESSORS 


The 1960 K series of processors provides a complete 
implementation of the 1960 architecture, plus several 
extensions to that architecture. These extensions fall 
into two categories: floating-point processing and inter- 
agent communication. 


On-Chip Floating Point 


The 1960 KB processor provides a complete implemen- 
tation of the IEEE standard for binary floating-point 
arithmetic (IEEE 754-185). This implementation in- 
cludes a full set of floating-point operations, includ- 
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ing add, subtract, multiply, divide, trigonometric func- 
tions and logarithmic functions. These operations are 
performed on single precision (32-bit), double precision 
(64-bit) and extended precision (80-bit) real numbers. 


One of the benefits of this implementation is that the 
floating-point handling facilities are integrated into the 
normal instruction execution environment. Single and 
double precision floating-point values are stored in the 
same registers as non-floating point values. Four 80-bit 
floating-point registers are provided to hold extended- 
precision values. 


Interagent Communication 


All of the processors in the i960 K series provide an 
inter-agent communication (IAC) mechanism, allowing 
agents connected to the processor’s bus to communi- 
cate with one another. This mechanism operates simi- 
larly to the interrupt mechanism, except that IAC mes- 
sages are passed through dedicated sections of memory. 
The sort of tasks handled with IAC messages are proc- 
essor reinitialization, stopping the processor, purging 
the instruction cache and forcing the processor to check 
pending interrupts. 


| — 80960KA 
EMBEDDED 32-BIT PROCESSOR 


mg. High-Performance Embedded |_— & Built-In Interrupt Controller 
Architecture | | | — 32 Priority Levels 256 Vectors 
— 25 MIPS Burst Execution at 25 MHz = #$$—3.4 ys Latency @ 25 MHz. 


— 9.4 MIPS" Sustained Execution at Easy to Use, High Bandwidth 32-Bit Bus 
_ 25 MHz — 66.7 Mbytes/s Burst 

m 512-Byte On-Chip Instruction Cache ! _— Up to 16-Bytes Transferred per Burst 
| — Direct Mapped 


— Parallel Load/Decode for Uncached - a Glgaby ie; Tnear necress Space 
Instructions 132-Lead Pin Grid Array (PGA) Package 
‘m@ Pin Compatible with 80960KB 132-Lead Plastic Quad Flat Pack (PQFP) 
mu Multiple Register Sets Uses 85C960 Bus Controller 
— Sixteen Global 32-Bit Registers 2 Supported by 27960KX Burst EPROMs — 


— Sixteen Local 32-Bit Registers 
' — Four Local Register Sets Stored 
On-Chip | 
— Register Scoreboarding 


The 80960KA is a member of Intel’s new 32-bit processor family, the i960 series, which is designed especially 
for embedded applications. It is based on the family’s high performance, common core architecture, and 
includes a 512-byte instruction cache and a built-in interrupt controller. The 80960KA has a large register set, 
multiple parallel execution units and a high-bandwidth, burst bus. Using advanced RISC technology, this high — 
_ performance processor is capable of execution rates in excess of 9.4 million instructions per second.* The 
80960KA is well-suited for.a wide range of embedded applications, including laser printers, image progeeelng: 
industrial control, robotics and telecommunications. 


*Relative to Digital Equipment Corporation’s VAX-11/780** at 1 MIPS 
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Figure 1. The 80960KA’s Highly Parallel Microarchitecture 


**VAX-11TM is a trademark of Digital Equipment Corporation. 
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THE 960 SERIES 


The 80960KA is a member of a new family of 32-bit 
microprocessors from Intel known as the i960 Se- 
ries. This series was especially designed to serve 
the needs of embedded applications. The embed- 
ded market includes applications as diverse as in- 
dustrial automation, avionics, image processing, 
graphics, robotics, telecommunications and automo- 
biles. These types of applications require high 
integration, low power consumption, quick interrupt 
response times and high performance. Since time to 
market is critical, embedded microprocessors need 
to be easy to use in both hardware and software 
designs. 
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NOTES: 


1. Register g15 is reserved for stack management functions. — 
2. Registers r0, ri, and r2 are reserved for stack management functions. 
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All members of the 80960 series share a common 
core architecture which utilizes RISC technology so 
that, except for special functions, the family mem- 
bers are object code compatible. Each new proces- 
sor in the series will add its own special set of func- 
tions to the core to satisfy the needs of a specific 
application or range of applications in the embedded 
market. For example, future processors may include 
a DMA controller, a timer or an A/D converter. 


Software written for the 80960KA will run without 
modification on any other member of the 80960 fam- 
ily. It is also pin-compatible with the 80960KB, which 
includes an integrated floating-point unit, and the 
80960MC, a military-grade version with support for 
multitasking, memory management, multiprocessing 
and fault tolerance. 


ADDRESS 
SPACE 


Figure 2. Register Set 
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KEY PERFORMANCE FEATURES 


The 80960KA’s architecture is based on the most 
recent advances in RISC technology and is ground- 
ed in Intel’s long experience in designing embedded 
controllers. Many features. contribute. to the 
80960KA’s exceptional performance: . 


1. Large Register Set. Modern compilers can take 
advantage of a large number of registers to optimize 
execution speed. For: maximum - flexibility, the 
80960KA provides 32 32-bit registers and four 80-bit 
floating-point registers. (See Figure 2.). : 


2. Fast Instruction Execution. Simple functions 
make up the bulk of instructions in most programs, 
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so that execution speed can be greatly improved by 
ensuring that these core instructions execute in as 
short a time as possible. The most-frequently exe- 
cuted instructions such as register-register moves, 
add/subtract, logical operations, and shifts execute 
in one to two cycles (Table 1 contains. a list of in- 
structions.) | : . 


3. Load/Store Architecture. One way to improve 
execution speed is to reduce the number of times 
that the processor must access memory to perform 
an. operation. Like other processors based on RISC 
technology, the 80960KA has a Load/Store archi- 
tecture, only the LOAD and STORE instructions ref- 
erence memory; all other instructions operate on 
registers. 
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Figure 3. Instruction Formats 
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Table 1. 80960KA Instruction Sei 


Load Add Set Bit 
Store Subtract | 4 Clear Bit 
Move | Multiply : Not Bit 
Load Address Divide Check Bit 
Remainder Exclusive Or | Alter Bit 
Modulo Not Or - Scan for Bit 
Shift ~ OrNot | | Scan over Bit 
Nor | Extract | 
‘Exclusive Nor Modify 
Not 
Nand 
Rotate 


Compare Unconditional Call Conditional Fault 
Conditional Branch CallExtended — Synchronize Faults 
Compare Conditional Branch Call System -_ 

Compare and Compare and Return . 
Increment _ Branch — ~ Branch and Link 
Compare and 
Decrement 


* Debug Miscellaneous __. Decimal 


Modify Trace © ~ Atomic Add eae Move — | 
Controls Atomic Modify | Add with Carry 
~ Mark | _ Flush Local Registers . Subtract with Carry 
Force Mark => Modify Arithmetic. 4 
| Controls | 
Scan Byte for Equal 
' Test Condition Code 
Modify Process Controls 


| Synchronous 


Synchronous (oad. 
Synchronous Move 
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4. Simple Instruction Formats. All instructions in. 
the 80960KA are 32-bits long and must be aligned | 


on word boundaries. This alignment makes it possi- 


ble to eliminate the instruction-alignment stage in — 


the pipeline. To simplify the instruction decoder fur- 


ther, there are only five instruction formats and each — 


instruction uses only one format. (See Figure 3.) 


5. Overlapped Instruction Execution. A load oper- 
ation allows execution of subsequent instructions to 
continue before the data has been returned from 
memory, so that these instructions can overlap the 
load. The 80960KA manages this process transpar- 
ently to software through the use of a register score- 


board. Conditional instructions also make use of a_ 
scoreboard so that subsequent unrelated instruc- | 


tions can be executed while the conditional instruc- 
tion is pending. 


6. Integer Execution Optimization. When the re-. 


sult of an operation is used as an operand in a sub- 
sequent calculation, the value is sent immediately to 
its destination register. Yet at the same time, the 
value is put back on a bypass path to the ALU, 
thereby saving the time that otherwise would be re- 
quired to retrieve the value for the next operation. 


7. Bandwidth Optimizations. The 80960KA gets | 


optimal use of its memory bus bandwidth because 


the bus is tuned for use with the cache: the line size . 


of the instruction cache matches the maximum burst 


size for instruction fetches. The 80960KA automati- . 


cally fetches four words in a burst and stores them 
directly in the cache. Due to the size of the cache 
and the fact that it is continually filled in anticipation 
of needed instructions in the program flow, the 
80960KA is exceptionally insensitive to memory wait 
‘states. In fact, each wait state causes only a 7% 
degradation in system perfomance. The benefit is 
that the 80960KA will deliver outstanding perform- 
ance even with a low cost memory system. 


8. Cache Bypass. If there is a cache miss, the proc- 
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to ensure efficient compiler implementations of high- 


level languages such as C, Fortran and Ada. Table 2 
lists the memory addressing modes. 


Data Types | 
The 80960KA recognizes the following data types: 


Numeric: 
e 8-, 16-, 32- and 64-bit ordinals 
e 8-, 16, 32- and 64-bit integers 


Non-Numeric: 

e Bit 

e Bit Field 

e Triple-Word (96 bits) 
® Quad-Word (128 bits) 


Large Register Set 


The programming environment of the 80960KA in- 
cludes a’ large number of registers. In fact, 32 regis- 
ters are available at any time. The availability of this 


many registers greatly reduces the number of mem- 


ory accesses required to execute most programs, 


_which leads to Greater instruction processing speed. 


There are two types of general- purpose registers: 
local and global. The global registers consist of six- 
teen 32-bit registers (GO through G15) These regis- 
ters perform. the.same function as the general- pur- 
pose registers provided in other popular microproc- 


-essors. The term global refers to the fact that these 


registers retain their contents across procedure 
calls. . 


The local registers, on the other hand, are proce- — 


essor fetches the needed instruction, then sends it - 


on to the instruction decoder at the same time it 
updates the cache. Thus, no extra time is taken to 
load and read the cache. 


Memory Space and Addressing Modes 


The 80960KA offers a linear programming environ- 
ment so that all programs running on the processor 
are contained in a single address space. The maxi- 
mum size of the address spate is 4 Gigabytes (232 
bytes). 


For ease of use, the 80960KA has a small number of 
addressing modes, but includes all those necessary 
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dure specific. For each procedure call, the 80960KA 
allocates 16 local registers (RO through R15). Each 
local register is 32 bits wide. 


Multiple Register Sets 


To further increase the efficiency of the register set, 
multiple sets of local registers are stored on-chip. 
This cache holds up to four local register frames, 
which means that up to three procedure calls can be 
made without having to access the procedure stack 
resident in memory. 


Although programs may have procedure calls nest- 
ed many calls deep, a program typically oscillates 
back and forth between only two or three levels. As 
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Table 2. Memory Addressing Modes 


12-Bit Offset 

32-Bit Offset 
Register-Indirect 
Register + 12-Bit Offset 


Register + 32-Bit Offset 
Register + (Index-Register < Scale-Factor) 


Register x Scale Factor + 32-Bit Displacement 


Register + (Index-Register < Scale-Factor) + 32-Bit Displacement 


Scale-Factor is 1, 2, 4, 8 or 16 


a result, with four stack frames in the cache, the 
probability of there being a free frame on the cache 
when a call is made is very high. In fact, runs of 
- representative C-language programs show that 80% 
of the calls are handled without needing to access 
memory. 


If there are four or more active procedures and a 
new procedure is called, the processor moves the 
oldest set of local registers in the register cache to a 


REGISTER 


ONE OF FOUR 
LOCAL 
REGISTER SETS 


procedure stack in memory to make room for a new 
set of registers. Global register G15 is used by the 
processor as the frame pointer (FP) for the proce- 
dure stack. ~— 


Note that the global registers are not exchanged on 
a procedure call, but retain their contents, making 
them available to all procedures for fast parameter 
passing. An illustration of the register cache is 
shown in Figure 4. 


LOCAL REGISTER SET 
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Figure 4. Multiple Register Sets Are Stored On-Chip 
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Instruction Cache ; 


To further reduce memory accesses, the 80960KA 
includes a 512-byte on-chip instruction cache. The 
instruction cache is based on the concept of locality 
of reference; that is, most programs are not usually 
executed in a steady stream but consist of many 
branches and loops that lead to jumping back and 
forth within the same small section of code. Thus, by 
maintaining a block of instructions in a cache, the 
number of memory references required to read in- 
structions into the PCa can be greatly fedueed: 


To load the instruction cache, instructions are 
fetched in 16-byte blocks, so that up to four instruc- 
tions can be fetched at one time. An efficient 
prefetch algorithm increases the probability that an 
instruction will already be in the cache when it is 
needed. 


Code for small loops will often fit entirely within the 
cache, leading to a great increase in processing 
speed since further memory references might not be 
necessary until the program exits the loop. Similarly, 
when calling short procedures, the code for the call- 
ing procedure is likely to remain in the cache, so it 
will be there on the procedure’s return. 


Register Scoreboarding 


The instruction decoder has been optimized in sev- 
eral ways. One of these optimizations is the ability to 
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do instruction overlapping by means of register 
scoreboarding. 


Register scoreboarding occurs when a LOAD in- 
struction is executed to move a variable from memo-. 
ry into a register. When the instruction is initiated, a 
scoreboard bit on the target register is set. When the 
register is actually loaded, the bit is reset. In be- 
tween, any reference to the register contents is ac- 
companied by a test of the scoreboard bit to insure 
that the load has completed before processing con- 
tinues. Since the processor does not have to wait for 
the LOAD to be. completed, it can go on to execute 
additional instructions placed in between the LOAD 
instruction and the instruction that uses the register 
contents, as shown in the following example: 


LOAD R4, address 1 
LOAD R85, address 2 
‘Unrelated instruction 
Unrelated instruction 
ADD R4, R5, R6 


In essence, the two unrelated instructions between 
the LOAD and ADD instructions are executed for 
free (i.e., take no apparent time to execute) because 
they are executed while the register is being loaded. 
Up to three instructions can be pending at one time 
with three corresponding scoreboard bits set. By ex- 
ploiting this feature, system programmers and com- 
pilers have a a tool for optimizing execution 


' Speed. 
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High Bandwidth Local Bus 


An 80960KA CPU resides on a high-bandwidth ad- 
dress/data bus known as the local bus (L-Bus). The 
L-Bus provides a direct communication path be- 
tween the processor and the memory and I/O sub- 
system interfaces. The processor uses the local bus 
to fetch instructions, manipulate memory, and re- 
spond to interrupts. Its features include: 


© 32-bit multiplexed address/data path 


® Four-word burst capability, which allows transfers 
from 1 to 16 bytes at a time 


© High bandwidth reads and writes at 66.7 Mbytes 
~ per second 


© Special signal to indicate whether a memory 
~ transaction can be cached 


Figure 5 identifies the groups of signals which con- 
stitute the L-Bus. Table 4 lists the function of the L- 
Bus and other processor-support signals, such as 
the interrupt lines. 


Interrupt Handling 


The 80960KA can be interrupted in one of two ways: 
by the activation of one of four interrupt pins or by 
_sending a message on the processor’s data bus. 


The 80960KA is unusual in that it automatically han- 
dles interrupts on a priority basis and tracks pending 
interrupts through its on-chip. interrupt controller. 
Two. of the interrupt pins can be configured to. pro- 
vide 8259A handshaking for expansion beyond four 
interrupt lines. 
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Debug Features 


The 80960KA has built-in debug capabilities. There 
are two types of breakpoints and six different trace 
modes. The debug features are controlled by two 
internal 32-bit registers, the Process-Controls Word 
and the Trace-Controls Word. By setting bits in 
these control words, a software debug monitor can 
closely control how the processor responds during 
program execution. 


The 80960KA has both hardware and software 
breakpoints. It provides two hardware: breakpoint 
registers on-chip which can be set by a special com- 
mand to any value. When the instruction pointer 
matches the value in one of the breakpoint registers, 
the breakpoint will fire, and a breakpoint handling 
routine is called automatically. 


The ‘80960KA also provides software breakpoints 
through the use of two instructions, MARK and 
FMARK. These instructions can be placed at any 
point in a program and will cause the processor to 
halt execution at that point and call the breakpoint } 
handling routine. The breakpoint mechanism is easy 
to use and provides a powerful debugging tool. 


Tracing is available for instructions (single-step exe- 
cution), calls and returns, and branching. Each dif- 
ferent type of trace may be enabled separately by a 
special debug instruction. In each case, the 
80960KA executes the instruction first and then 
calls a trace handling routine (usually part of a soft- 
ware debug monitor). Further program execution is 
halted until the trace routine is completed. When the 
trace event handling routine is completed, instruc- 
tion execution resumes at the next instruction. The 
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Figure 5. Local Bus Signal Groups 
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80960KA’s tracing mechanisms, which are. imple- 
mented completely in hardware, greatly simplify the 
task of testing and debugging software. 


FAULT DETECTION 


The 80960KA has an automatic mechanism to 
handle faults. There are ten fault types including 
trace, arithmetic, and floating-point faults. When the 
processor detects a fault, it automatically calls the 
appropriate fault handling routine and saves the cur- 
rent instruction pointer and necessary state informa- 
tion to make efficient recovery possible. The proces- 
sor posts diagnostic information on the type of fault 
to a Fault Record. Like interrupt handling routines, 
fault handling routines are usually written to meet 
the needs of a specific application and are often in- 
cluded as part of the operating system or kernel. 


For each of the ten fault types, there are numerous 


_ subtypes that provide specific information about a 


fault. For example, a floating-point fault may have its 
~ subtype set to an Overflow or Zero-Divide fault. The 

fault handler can use this specific rermalon, to re- 
spond correctly to the fault. 2 


BUILT-IN TESTABILITY _ 


Upon reset, the 80960KA automatically conducts an 
extensive internal test (self- test) of its major blocks 
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of logic. Then, before executing its first instruction, it 
does a zero check sum on the first eight words in 
memory to ensure that the system has been loaded 
correctly. If a problem is discovered at any point dur- 
ing the self-test, the 80960KA will assert its FAIL- 
URE pin and will not begin program execution. The 
self-test takes approximately 47,000 cycles to com- 
plete. 


System manufacturers can use the ames self- 


test feature during incoming parts inspection. No 
special diagnostic programs need to be written, and 
the test is both thorough and fast. The self-test ca- 
pability helps ensure that defective parts will:be dis- 
covered before systems are shipped, and once in 
the field, the self-test makes it easier to distinguish 
between problems caused by processor failure and 
otis resulting from other causes. 


CHMOS| 


The 80960KA is fabricated using Intel’s CHMOS IV 
(Complementary High Speed Metal Oxide Semicon- 
ductor) process. This advanced technology elimi- 
nates the frequency and reliability limitations of older 
CMOS processes and opens a new era in micro- 
processor performance. It combines the high per- 
formance capabilities of Intel’s industry-leading 
HMOS technology with the high density and low 
power characteristics of CMOS. The 80960KA is 
available at 10, 16, 20 and 25 MHz. 


Table 4a. ‘80960KA Pin Description: L-Bus Signals - 


Symbol 


~CLK2 ~~ 


- SYSTEM CLOCK provides the fundamental timing for 80960KA systems. It is 
divided by two inside the 80960KA to generate the internal processor clock. 


LOCAL ADDRESS/DATA BUS carries 32-bit physical addresses and data to and 
from memory. During an address (T,) cycle, bits 2-31 contain a physical word 
address (bits 0-1 indicate SIZE; see below). During a data (Tq) cycle, bits 0-31 
contain read or write data. The LAD lines are active HIGH and float to a high 
impedance state when not active. 


SIZE, which is comprised of bits O—1 of the LAD lines during a Tg cycle, specifies _ 
the size of a burst transfer in words. 


LAD ; 


LAD 9 


1 Word 

2 Words 
3 Words 
4 Words 


ADDRESS-LATCH ENABLE indicates the transfer of a physical address. ALE is 
asserted during a Tg cycle and deasserted before the beginning of the Tg state. It 
is active LOW and floats to a high impedance state during a hold cycle (Tp, or Thy). 


1/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = tri-state 
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Table 4a. 80960KA Pin Description: L-Bus Signals (Continued) 


ADDRESS/DATA STATUS indicates an address state. ADS is asserted every Ta 
state and deasserted during the following Tg state. For a burst transaction, ADS is 
asserted again every Tq state wnere READY was asserted in the previous cycle. 


WRITE/READ specifies, during a Tg cycle, whether the operation is a write or 
read. It is latched on-chip and remains valid during Ty cycles. 


DT/ R DATA TRANSMIT/RECEIVE indicates the direction of data transfer to and from 
the L-Bus. It is low during Ta and Tg cycles foraread or interrupt. __ 
acknowledgement; it is Is high during Ta and Tg cycles for a write. DT/R never 
changes state when DEN is asserted (see Timing Diagrams). 

DATA ENABLE is asserted during Tq cycles and indicates transfer of data on the 
LAD bus lines. 


READY indicates that data on LAD lines can be sampled or removed. If READY is 
not asserted during a Tg cycle, the Tg cycle is extended to the next cycle by 
inserting a wait state (Tw), and ADS is not asserted in the next cycle. 


BUS LOCK prevents other bus masters from gaining control of the L-Bus 
following the current cycle (if they would assert LOCK to do so). LOCK is used by 
the processor or any bus agent when it performs indivisible Read/Modify/Write 
(RMW) operations. Do not leave LOCK unconnected. It must be pulled high for the 
processor to function properly. 


For a read that is designated as a RMW-read, LOCK is examined. if asserted, the 
processor waits until it is not asserted; if not asserted, the processor asserts 
LOCK during the T, cycle and leaves it asserted. 


A write that is designated as an RMW-write deasserts LOCK in the Ta ele. 
During the time LOCK is asserted, a bus agent can perform a normal read or write 
but no RMW operations. LOCK is also held asserted during an interrupt- 
acknowledge transaction. 


BYTE ENABLE LINES specify which data bytes (up to four) on the bus take part. 
in the current bus cycle. BE3 corresponds to LAD3;—LADo,4 and BEg corresponds 
to LAD7—LADo. 


The byte enables are provided in advance of data. The byte enables asserted 
during Tg specify the bytes of the first data word. The byte enables asserted 
during Tg specify the bytes of the next data word (if any), that is, the word to be 
transmitted following the next assertion of READY. The byte enables during the 
Tq cycles preceding the last assertion of READY are undefined. The byte enables 
are latched on-chip and remain constant from one Tg cycle to the next when 
READY is not asserted. 


For reads, the byte enables specify the byte(s) that the processor will actually use. 
L-Bus agents are required to assert only adjacent byte enables (e.g., asserting just 
BEp and BE» is not permitted), and are required to assert at least one byte enable. 

To produce address bits Ap and A; externally, they can be decoded from the byte 
enables. 


1/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = tri-state 
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Table 4a. 80960KA Pin Description: L-Bus Signals (Continued) 


Symbol Name and Function 


| HOLD/ HOLD: If the processor is the primary bus master (PBM), the input is interpreted 
HLDAR as HOLD, a request from a secondary bus master to acquire the bus. When the 
processor receives HOLD and grants another master control of the bus, it floats 
its tri-state bus lines and then asserts HLDA and enters the T}, state. When HOLD 
is deasserted, the processor will deassert HLDA and goto either the TjorT, 
state. 
HOLD ACKNOWLEDGE RECEIVED: If the processor is a secondary bus master 
(SBM), the input is HLDAR, which indicates, when HOLDR output is high, that the 


processor has acquired the bus. Processors and other agents can be told at reset 
if they are the primary bus master (PBM). 


HOLD ACKNOWLEDGE: If the processor is a primary bus master, the output is 
HLDA, which relinquishes control of the bus to another bus master. 


HOLD REQUEST: For secondary bus masters (SBM), the output is HOLDR, which 
is a request to acquire the bus. The bus is said to be acquired if the agentisa 
primary bus master and does not have its HLDA output asserted, or if the agent is 
a secondary bus master and has its HOLD input and HLDA output asserted. 


CACHE indicates if an access is cacheable during a Tg cycle. It is not asserted 
during any synchronous access, such as a synchronous load or move instruction 
used for sending an IAC message. The CACHE signal floats to a high impedance 
state when the processor is idle. 


Table 4b. 80960KA Pin Description: Module Support Signals 


Symbol Name and Function | 


BADAC BAD ACCESS, if asserted in the cycle following the one in which the last READY 
of a transaction is asserted, indicates that an unrecoverable error has occurred on — 
the current bus transaction, or that a synchronous load/store instruction has not 
been acknowledged. . 
| STARTUP: During system reset, the BADAC signal is interpreted differently. If the 
signal is high, it indicates that this processor will perform system initialization. If it 
is low, another processor in the system will perform system initialization instead. 


RESET RESET clears the internal logic of the processor and causes it to re-initialize. 
During RESET assertion, the input pins are ignored (except for BADAC and | 
IAC/INTo), the tri-state output pins are placed in a high impedance state, and 
other output pins are placed in their non-asserted state. . 
RESET must be asserted for at least 41 CLK2 cycles for a predictable RESET. 
The HIGH to LOW transition of RESET should occur after the rising edge of both 
CLK2 and the external bus CLK, and before the next rising edge of CLK2. | 


FAILURE INITIALIZATION FAILURE indicates that the processor has failed to initialize 
correctly. After RESET is deasserted and before the first bus transaction begins, 
FAILURE is asserted while the processor performs a self-test. If the self-test 
completes successfully, then FAILURE is deasserted. Next, the processor 
performs a zero checksum on the first eight words of memory. If it fails, FAILURE 
is asserted for a second time and remains asserted; if it passes, system 
initialization continues and FAILURE remains deasserted. 


NOT CONNECTED indicates pins should not be connected. Never connect any 
pin marked N.C. 


I/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = tri-state 
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Table 4b. 80960KA Pin Description: Module Support Signals (Continued) 


— Name and Function 


INTERAGENT COMMUNICATION REQUEST/INTERRUPT 0 indicates either 
that there is a pending IAC message for the processor or an interrupt. The bus 
interrupt control register determines in which way the signal should be interpreted. 
To signal an interrupt or IAC request in a synchronous system, this pin (as well as 
the other interrupt pins) must be enabled by being deasserted for at least one bus 
cycle and then asserted for at least one additional bus cycle; in an asynchronous 
system, the pin must remain deasserted for at least two bus cycles and then be 
asserted for at least two more bus cycles. 


LOCAL PROCESSOR NUMBER: This signal is interpreted differently during 
system reset. If the signal is at a high voltage level, it indicates that this processor 
is a primary bus master (Local Processor Number = 0); if it is at a low voltage 
level, it a that this processor is a secondary bus master ence Processor 
Number = 


INT1 Pte | INTERRUPT 1, like INTO, provides direct interrupt signaling. | 
INT2/ INTERRUPT 2/INTERRUPT REQUEST: The bus control registers determines 
INTR how this pin is interpreted. If INT2, it has the same interpretation as the INTO and 
INT3/ 1/0 
INTA O.D. 
acknowledge bus transactions. The INTA output is latched on-chip and remains 
valid during Ty cycles; as an output, it is open-drain. 


INT1 pins. If INTR, itis used to receive an interrupt request from an external 
1/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain,.T.S. = tri-state 


interrupt controller. 


INTERRUPT 3/INTERRUPT ACKNOWLEDGE: The bus interrupt control register 
determines how this pin is interpreted. lf INT3, it has the same interpretation as 
the INTO, INT1, and INT2 pins. If INTA, it is used as an output to control interrupt- 


ELECTRICAL SPECIFICATIONS Power Decoupling Recommendations 

; Liberal decoupling capacitance should be placed 
Power and Grounding near the 80960KA. The processor can cause tran- 

sient power surges when driving the L-Bus, particu- 

The 80960KA is implemented in CHMOS IV technol- larly when it is connected to a large capacitive load. 
ogy and has modest power requirements. Its high 
clock frequency and numerous output buffers (ad- — Low inductance capacitors and interconnects are 
dress/data, control, error, and arbitration signals) recommended for best high frequency electrical per- - 
can Cause power surges as multiple output buffers formance. Inductance can be reduced by shortening 
drive new signal levels simultaneously. For clean on- the board traces between the processor and de- 


chip power distribution at high frequency, 12 Vcc coupling capacitors as much as possible. Capacitors 

and 13 Vss pins separately feed functional units of specifically designed for PGA packages are also 

the 80960KA in the PGA. commercially available and offer the lowest possible 
inductance. 

Power and ground connections must be made to all 

' power and ground pins of the 80960KA. On the cir- 


cuit board, all Vcc pins must be strapped closely Connection Recommendations 
together, preferably on a power plane. Likewise, all | 


Vgs pins should be strapped together, preferably on For reliable operation, always connect unused in- 
a ground plane. These pins may not be connected __ puts to an appropriate signal level. In particular, if 
together within the chip. one or more interrupt lines are not used, they should 


_ be pulled up. No inputs should ever be left floating. 


3-45 


inte. 


All open-drain outputs require a pullup device. While 
in some cases a simple pullup resistor will be ade- 
quate, we recommend a network of pullup and pull- 
down resistors biased to a valid Viy (=3.4V) and 
terminated in the characteristic impedance of the cir- 
cuit board. Figure 6 shows our recommendations for 
the resistor values for both a low and high current 
drive network, which assumes that the circuit board 
has a characteristic impedance of 1002. The advan- 
tage of terminating the output signals in this fashion 


is that it limits signal swing and reduces AC power 


consumption. 


Characteristic Curves 


Figure 7 shows the typical supply current require- 
ments over the operating temperature range of the 
processor at supply voltage (Vcc) of 5V. Figure 8 
shows the typical power supply current (Icc) re- 
quired by the 80960KA at various operating frequen- 
cies when measured at three input vonage (Voc) 
levels. 


For a given output current (Io,), the curve in Figure 9 © 


‘shows the worst case output low voltage (Vo,). 


80960KA 
OPEN=DRAIN 
OUTPUT 


ee 270775-4 
Low Drive Network: 
© VoH = 3.42V 
lo. = 25.3 mA 


80960KA 


Figure 10 shows the typical capacitive derating 
curve for the 80960KA measured from 1.5V on the 
system clock (CLK) to 1.5V on the falling edge and 
1.5V on the rising edge of the L-Bus address/data 
(LAD) signals. : | 7 


Test Load Circuit 


Figure 13 illustrates the load circuit used to test the 


_ 80960KA’s tristate pins, and Figure 14 shows the 


load circuit used to test the open drain outputs. The 
open drain test uses an active load circuit in the form 
of a matched diode bridge. Since the open-drain out- 
puts sink current, only the lo, legs of the bridge are 
necessary and the lox legs are not used. When the 
80960KA driver under test is turned off, the output 


pin is pulled up to Vre_r (i-e., Von). Diode Dy is 


turned off and the lot © current source flows through 
diode Do. 


When the 80960KA open-drain driver under test is 
on, diode D, is also on, and the voltage on the pin 
being tested drops to Vo,. Diode Do turns off and 
lo. flows through diode Di. 


; 80960KA 
‘OPEN=DRAIN 
OUTPUT 
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High Drive Network: 
° Voy = 3.41V 
© lo, = 33.8 mA 


Figure 6. Connection Recommendations for Low and High Current Drive Networks 
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POWER SUPPLY CURRENT (mA) 


San ae bl 


| tj | | P= 


40 60 


CASE TEMPERATURE (°C) 


O25 MHz O120MHz 016 MHz: 610 MHz 
270775-6 | 


TYPICAL SUPPLY CURRENT (mA) 


’ OPERATING FREQUENCY (MHz) 


0@64.5V O@5.0V @5.5V 


270775-7 


Output Low Voltage (V) 
Tristate Output Valid Delay: (ns) 


Output Low Current (mA) ' Capacitive Load (pF) 


270775-8 — 270775-9 


Figure 9. Worst Case Voltage vs Figure 10. Capacitive Derating Curve 
Output Current on Open-Drain Pins 7 
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ABSOLUTE MAXIMUM RATINGS* 


Petaling He MB Or aUNG excite OGIO TA ease *WARNING: Stressing the device beyond the “Absolute 
Stor age Temperature .......... — 65°C to + 150°C Maximum Ratings’ may cause permanent damage. 
Voltage on Any Pin.......... —0.5V to Voc + 0.5V These are stress ratings only. Operation beyond the 

“Operating Conditions” is not recommended and ex- 


tended exposure beyond the “Operating Conditions” 


Power Dissipation ...............4. 2.5W (25 MHz) 
"3 | may affect device reliability. 


DC CHARACTERISTICS 


PGA: , 
80960KA (16 MHz): Tcase = 0°C to + 85°C, Voc = 5V 410% , 
80960KA (20 and 25 MHz): Tcase = 0°C to + 85°C, Voc = 5V 45% | 


PQFP: | . 
80960KA (10 and 16 MHz): Tcase = 0°C to + 100°C, Veg = 5V +10% 
B0960KA (20 MHz): Tcase = 0°C to +100°C, Veco = 5V +5% 


[Parameter |Win | Max 
Tv, | trputtowvotage | -09 | +08 
[Vin | Input igh Votage | 20 | Vag + 08 
Tctk2inputtow Votage | 09 | +08 
mins 


| Symbol _ 
2.0 
Ne. | 
| Vo 
| Vou ae 
| Vow 
| Power Supply Current: — ca 
10 MHz : | 
16 MHz 
20 MHz. * : 
25 MHz 
FA ES al 
plo ars 
Poel 
ae 
|S 


Input Leakage Current 
Output Leakage Current 


ViL 

VIH 

VoL 

VCH. 

VoL 

VOH 

loc | 

I | 
ILo | 
Cin. 
Co 


| Cy 10 
eG, d 1/O or Output Capacitance 12 
. 10 
~ NOTES: 7 | 
1. For tri-state outputs, this parameter is measured at: . | 
ACdIOSS/ Dalai < i 550% sobecwiede tas tks ere eae Ce ih SAR Ret Se Res eoe bees at nes SSeeaed wd eR ee eua ia aseue 4.0mA 
COntrolScssuis gauss ere pRB eiras Peeves SE UR way snes MWR awash eDe ESR ET Tees APOE Grs 5.0 mA 
2. This parameter is measured at: ' 
ACG OSS/ Daley sancti orues weer enan ses Medica aah Obs fot eS Manaus ote eam hearers MRED eh acon aitenerwiatie —1.0 mA 
CGAUOIS so le ie aine ae ess at taba atten meena Maw ateani ee damnn phubaare ete nte ate tmaehuagals westetiaspieds —0.9 mA 
AGE wc cies’ esinseatuas Baaoits PE LT Te ee ee EEL EET Le See OT ee Ce ee re Tee mT ete ee —5.0 mA 
3. Input, output, and clock capacitance are not tested. 
4. Not measured on open-drain outputs. 
5. For open-drain outputs ....... AUS bs Gees OR UA pind Card eee aaa wae ON age wee hewmen ROS ERC Cr a error eT 25mA 
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AC SPECIFICATIONS 


This section describes the AC specifications for the 
80960KA pins. All input and output timings are spec- 
ified relative to the 1.5V level of the rising edge. Four 
output timings, the specifications refer to the time it 
takes the signal to reach 1.5V. For input timings, 


OUTPUTS: 
ADS, 
W/R,DEN 


CACHE _ 
LOCK,INTA . 


ROEN, WY N 
BE,-BE, | q 1.5V VALID OUTPUT 1.5V RX 
HLDA/HOLDR, aw! rT 


80960KA 


the specifications refer to the time at which the sig- 
nal reaches (for input setup) or leaves (for hold time) 
the TTL levels of LOW (0.8V) or HIGH (2.0V). All AC 
testing should be done with input clock voltages of 
0.4V and 2.4V, except for the clock (CLK2), which 
should be tested with input voltages of 0.45 Voc and 
0.55 Vcc. 


Ty4 


INT, /INTR,INTs 


INPUTS: 


HOLD,HLDAR 


T T 
12 "1 VALID INPUT 


es 
=“ WWW 


270775-10 


Figure 11. Drive Levels and Timing Relationships for 80960KA Signals 
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— 
manilt 


BE(O:3) = SM EST SS SESSA 


sina 
wi Be] | Ae 


| oe ae. =a ae 


" emey SQ ak OL ctasstsal” BSS 
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__ Figure 12. Timing Relationship of L-Bus Signals _ 
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AC Specification Tables 
80960KA AC Characteristics (10 MHz, PQFP Only) 


Ty Processor Clock 125 
Period (CLK2) 

To Processor Clock : 

Low Time (CLK2) 


Test Conditions 


Vin = 1.5V 


Vit = 10% Point 


oi 
oO 


Ne) 


= 1.2V 
T3 Processor Clock 12 ns Vin = 90% Point | 
High Time (CLK2) = 0.1V + 0.5 Vcc 
Processor Clock 10 Vin = 90% Point to 10% 
Fall Time (CLK2) Point 
Processor Clock 0 Vin = 10% Point to 90% 
Rise Time (CLK2) Point 


OL = = 100 pF (LAD) 
C. = 75 pF (Controls)(2) 


CL = 75 pF 


on 


~ Qutput Valid 
Delay 

HOLDA Output 
Valid Delay 
ALE Width 


—_ ALE Output Valid Delay 


CL = 75 pF 
L = 75 pF(2) 


C. = 100 pF (LAD) 
Ci = 75 pF (Controls) 


CL = 75pF 


© 
| 


N 
>) 


Output Float 
Delay 

HOLDA Output 
Float Delay 
Input Setup 1 
Input Hold 


HOLD Input 
Hold 


ye) 
io) 


(2) as 


te Input Setup 2 

Setup to ALE Cy = 100 pF (LAD) 

Inactive C. = 75 pF (Controls) 

Hold after ALE Cy = 100 pF (LAD) 

Inactive Ci = 75 pF (Controls) 
x Reset Setup ae 
PT Reset Width . 1640 41 CLK2 Periods Minimum 


NOTES: . 

1. IAC/INTo, INTy, INTo/INTR, INT3 can be asynchronous. 

2. A float condition occurs when the maximum output current becomes less than |,_o. Float delay is not tested, but should be 
no longer than the valid delay. 

3. Clock rise and fall time is not tested. 
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80960KA AC Characteristics (16 MHz) 


Processor Clock 
Period (CLK2) 
Processor Clock 
_Low Time (CLK2) 
Processor Clock 
High Time (CLK2) 
Processor Clock 
Fall Time (CLK2) 
Processor Clock 
Rise Time (CLK2) 
Output Valid 
Delay 

HOLDA Output 
Valid Delay 

ALE Width 

ALE Output Valid Delay 


Output Float 
Delay 

HOLDA Output 
Float Delay 
Input Setup 1 
Input Hold 


HOLD Input 
Hold 

Input Setup 2 
Setup to ALE 
Inactive 

Hold after ALE 
inactive 

Reset Hold 


Reset Width 


NOTES: 


80960KA 


1. IAC/INTo, INTy, INTo/INTR, INT3 can be nacyncianole, 


2. A float condition occurs when the maximum output current becomes less than I_o. Float delay is not tested, but should be 


no longer than the valid delay. 
3. Clock rise and fall time is not tested. 
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Vit = 10% Point 
= 1.2V 


Vin = 90% Point 
= 0.1V + 0.5Vcc 


Vin = 90% Point to 10% - 
Point 


Vin = 10% Point to 90% 
Point 


oe = 100 pF (LAD) 
CL = 75 pF (Controls) 


CL = 75 pF(2) 


CL = 100 pF (LAD) 
CL = 75 pF (Controls)(2) 


C. = 100 oo (LAD) 
= = 75 pF (Controls) 
= 100 pF (LAD) 
a = 75 pF (Controls) 
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Symbol | Parameter |“ Min_ | Max | Units | TestConditions 

Ty Processor Clock 25 125 ns Vin = 1.5V 
Period (CLK2) 

To Processor Clock ns Vit = 10% Point 

| Low Time (CLK2) = 1.2V 

T3 Processor Clock ns Vin = 90% Point 
High Time (CLK2) =0.1V+ 0.5Vcoc 

T4 Processor Clock 10 ns Vin = 90% Point to 10% 

| Fall Time (CLK2) Point 

Te. Processor Clock 10 Vin = 10% Point to 90% 
Rise Time (CLK2) — Point 
Output Valid 
Delay 


T C, = 60 pF (LAD) 
= 
= 


a) = 


Cy. = 50 pF (Controls) 
' Valid Delay 

9 

9 


fe Output Float 
Delay 
Tou ~HOLDA Output 
| Float Delay , 
Input Hold 
5 


N 
© 


C. = 50 pF(2) | 


C. = 60pF (LAD) 
C. = 50 pF (Controls)(2) 


CL. = 50 pF 


ye) 
io) 


pe) 
=) 


Pio | Input Setup aaa 
T44H HOLD Input 
Hold 
fe Tes Input Setup 2 
T Setup to ALE 
Inactive 
Hold after ALE 
- |nactive 
Tas | Reset Hold 


Reset Setup | 
Reset Width 


C. = 60 pF (LAD) 

Ci = 50 pF (Controls) 

C. = 60 pF (LAD) 

C. = 50 pF (Controls) 
ee 
Ln ee en: 

41 CLK2 Periods Minimum 

NOTES: 


1. IAC/INTo, INT;, INTo/INTR, INT3 can be asynchronous. 

2. A float condition occurs when the maximum output current becomes less than I,o0. Float delay is not tested, but should be 
no longer than the valid delay. : 

3. Clock rise and fall time is not tested. 


2 
4 
12 
| 2 
4 
ee 
5 
4 
7 
10 
3 
5 


Nh 


1025 


80960KA 
TRISTATE OUTPUT 


80960KA 
OPEN=DRAIN OUTPUT 


CL 


y 


~ 970775-12 lo. Tested at 25 mA 


VreF = Voc | 
D, and Do are matched 270775-13 


Figure 13. Test Load Circuit for Figure 14. Test Load Circuit for Open-Drain Output Pins 


Tri-State Output Pins 
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80960KA AC Characteristics (25 MHz, PGA Only) 


74 _ | Processor Clock 
| | Period (CLK2) 
To Processor Clock 
Low Time (CLK2) 
T3 Processor Clock 
High Time 
Ty Processor Clock | 
| Fall Time (CLK2) | 


Vit = 10% Point 
= 1.2V 

Vin = 90% Point 
=0.1V+0.5Voo | 


Vin = 90% Point to 10% 
7 Point 


— 
oe) 


oO 


Vin = 10% Point to 90% 
Point 

CL = 60 pF (LAD) 

C. = 50 pF (Controls) 


oo 


TéH HOLDA Output 
Valid Delay 


| 1 | ALE Width | : 
ALE Output Valid Delay 


ft 


C, = 50 pF(2) 


C. = 60 pF (LAD) 
C, = 50 pF (Controls) 


pe) 
oO 


HOLDA Output | 
Float Delay — 


NO 
io) 


InputSetup1 


Setup to ALE 
Inactive 


~ Hold after ALE 
Inactive. 


Reset Hold 
Reset Setup 


C_ = 60 pF (LAD) 

C. = 50 pF (Controls) 
Ci = 60 pF (LAD) 

C;, =.50 pF (Controls) 


mae 
a 
ra 
esac 
- 
i 
— 
— 


NOTES: ey 
1. JAC/INTO, INT1, INT2/INTR, INT3 can be asynchronous. 
2. A float condition occurs when the maximum output current becomes less than ILo. Float delay is not tested, but should be 
no longer than the valid delay. 
3. Clock rise and fall time is not tested. 
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HIGH LEVEL (MIN) 0.55V¢¢ 


OUTPUTS 


INIT PARAMETERS (BADAC, 
IACg) MUST BE SETUP 8 CLOCKS 


PRIOR TO THIS CLK2 EDGE Ty5 = RESET HOLD 


Tyg = RESET SETUP 


INIT PARAMETERS MUST BE HELD heave 


BEYOND THIS CLK2 EDGE 


Figure 16. RESET Signal Timing 
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; PRIMARY _ 
HLDA >| HOLDAR , 


Figure 17. Hold Timing | 


Design Considerations 


Input hold times can be disregarded by the designer | 


whenever the input is removed because a subse- 
quent output from the processor is deasserted (e.g., 
DEN becomes deasserted). 


Whenever the processor generates an output that 


80960KA 


270775-16 


SECONDARY 


270775-17 


When designing an 80960KA hardware system that 
uses the ICE-960KB to debug the system, several 


- electrical and mechanical characteristics should be 


_ considered. These considerations include capacitive 


indicates a transition into a subsequent state, any — 


outputs that are specified to be tri-stated in this new 
state are guaranteed to be tri-stated. For example, in 
the Ty cycle following a T, cycle for a read, the mini- 
mum output delay of DEN is 2 ns, but the maximum 
float time of LAD is 20 ns. When DEN is asserted, 


however, the LAD outputs are guaranteed to have. _ 


been tri-stated. 


Designing for the ICE-960KB 


The 80960KB In-Circuit Emulator assists in debug- 
ging both 80960KA and 80960KB hardware and 
software designs. The product consists of a probe 
module, cable, and control unit. Because of the high 
- operating frequency of 80960KA systems, the probe 
module connects directly to the 80960KA socket. 


loading, drive requirement, power requirement and 
physical layout. 


The .ICE-960KB probe module increases the load 
capacitance of each line by up to 25 pF. It also adds 
one standard Schottky TTL load on the CLKk2 line, 
up to one advanced low-power Schottky TTL load 
for each control signal line, and one advanced low- 
power Schottky TTL load for each address/data and 
byte enable line. These loads originate from the 
probe module and are driven by the 80960KA proc- 
essor. eo 


To achieve high noise immunity, the ICE-960KB 
probe is powered by the user’s system. The high- 


speed probe circuitry draws up to 1.1A plus the max- 
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imum current (Icc) of the 80960KA processor. 


The mechanical considerations are shown in Figure 
18, which illustrates the lateral clearance require- 
ments for the ICE-960KB probe as viewed from 
above the socket of the 80960KA processor. 


80960KA 


USER CPU 
SOCKET 


VERTICAL 
CLEARANCE 1.2" 


VIEW FROM 
ABOVE USER CPU 
SOCKET . 


MINIMUM CABLE 
BEND RADIUS: 
LESS THAN 3.0" 


MECHANICAL DATA 


Package Dimensions and Mounting 


The 80960KA is available in two different packages: 
a 132-lead ceramic pin-grid array (PGA) and a 132- 
lead plastic quad flat pack (PQFP). Pins in the ce- 
ramic package are arranged 0.100 inch (2.54 mm) 
center-to-center, in a 14 by 14 matrix, three rows 
around. (See Figure 19.) The plastic package uses 
fine-pitch gull wing leads arranged in a single row 
along the perimeter of the package with 0.025 inch 
(0.64 mm) spacing. (See Figure 20.) Dimensions are 
given in Figure 21 and Table 7. 


There are a wide variety of sockets available for the 
ceramic. PGA package including low-insertion or 
zero-insertion force mountings, and a choice of ter- 
minals such as soldertail, surface mount, or wire 
wrap. Several applicable sockets are shown in Fig- 
ure 22. 


The PQFP is normally surface mounted to take best 
advantage of the plastic package’s small footprint 
and low cost. In some applications, however, de- 
signers may prefer to use a socket, either to improve 


UNDER 


EMULATION 
PROCESSOR 


Figure 18. ICE-960KB Lateral Clearance Requirements 
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EMULATION 
PROCESSOR 


ICE PROCESSOR MODULE 
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heat dissipation or reduce repair costs. Figures 23a 
and: 23b show two of the many sockets available. 


Pin Assignment 

The PGA and PQFP have different pin assignments. 
Figure 24 shows the view from the bottom of the 
PGA (pins facing up) and Figure 25 shows a view 
from the top of the PGA (pins facing down). Figures 
20 and 32 show the top view of the PQFP; notice 
that the pins are numbered in order from 1 to 132 
around the package’s perimeter. Tables 5 and 6 list 
the function of each pin in the PGA, and Tables 8 
and 9 list the function of each pin in the PQFP. 


Vcc and GND connections must be made to multi- 
ple Voc and GND pins. Each Vcc and GND pin must 
be connected to the appropriate voltage or ground 
and externally strapped close to the package. We 
recommend that you include separate power and 
ground planes in your circuit board for power distri- 
bution. 


NOTE: 
Pins identified as N.C., “No Connect,’ should never 
be connected. 


intel. 


Package Thermal Specification 


The 80960KA is specified for operation when case 


temperature is within the range 0°C to + 85°C (PGA) . 


or +100°C (PQFP). The case temperature should. 


be measured at the top center of the package as 


shown in Ege 26. 


The ambient fembetaliirec can be calculated from 9ic 
and 6), by using the following equations: 


Ty = To + P* Bic 
Ta = Ty - P* Oia 
To = Ta + P*[6iq al * Gel 


Values for 6; ja and 6j, are given in Table 10 for the 


“PGA package and in ‘Table 11 for the PQFP for vari- 
ous airflows. Note that the 0j, for the PGA package 
can be reduced by adding a ‘heatsink, while a heat- 
sink is not generally used with the plastic package 
since it is intended to be surface mounted. The max- 
imum allowable ambient temperature (Ta) permitted 
without exceeding Tc is shown by the charts in Fig- 


ures 27 through 30 for 10 MHz, 16 MHz, 20 MHz, 


and 25 MHz respectively. 


The curves assume the maximum permitted supply 
current (icc) at each speed, Vcc of 5.0V, and a 
Toase Of + 85°C (PGA) or + 100°C (PQFP). 


If you will be using the 80960KA in a harsh environ- 
ment where the ambient temperature may exceed 
the limits for the normal commercial part, you should 
consider using an extended temperature part. These 
parts are designed by the prefix “TA” and are avail- 
able at 16 MHz, 20 MHz and 25 MHz in the ceramic 
PGA package. The extended operating temperature 
range is — 40°C to +125°C case. Figure 30 shows 
the maximum allowable ambient temperature for the 
20 MHz extended temperature TASO960KA at vari- 
ous airflows. The curve assumes an Icc of 420 mA, 
Vcc of 5. OV, anda TCASE of + 125°C. 


WAVEFORMS 


Figures 33 through 38 show the waveforms for vari- 
ous transactions on the 80960KA’s local bus. 
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SUPPORT COMPONENTS 


85C960 Burst Bus Controller 
- The Intel 85C960 performs burst logic, ready gener- 


ation, and address decode for the 80960KA and 
80960KB. The burst logic supports both standard 
and burst mode memories and peripherals. The 
ready generation and timing control supports 0 to 15 
wait states across eight address ranges for read/ 
write and burst accesses. The address decoder de- 
codes eight address inputs into four external and 
four internal chip selects. The wait state and chip 
select values may be programmed by the user; the 
timing control and burst logic are fixed. 


The 85C960 operates with the 80960KA and 
80960KB. at all frequencies and consumes only 
50 mA at 25 MHz. The 85C960 is housed in a 28-pin, 


300-mil ceramic DIP and plastic DIP packages or 28- 


pin PLCC package for surface mount. In the ceramic 


DIP package the part is UV-erasable, which makes it 


easy to revise designs. Order the 85C960 data ened 
(No. 290192) for am estan: 


27960KX Burst Mode EPROM 


Intel 27960KX one- -megabit EPROM is ssignee 
specifically to support the 80960KA and 80960KB. It 


uses a burst interface to offer near zero wait-state 


performance without the high cost of alternative 
memory technologies. The 27960KX removes the 
need for “dumping” code and data stored in slow 
EPROMs or ROMs into expensive high- speed 


| “enadow: RAM. 
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RiGrnally the 27960KX is sjaceedl in blocks of four 
bytes that are accessed sequentially. The address 
of the’ four-byte block is latched and incremented 
internally. After a set number of wait-states (1:or 2), 
data is output one word at a’time each subsequent 
clock cycle. High-performance outputs provide zero 
wait-state data-to-data burst accesses. Extra power 
and ground pins dedicated to the output reduce the 
effect of fast output switching on the device. The 
27960KX offers 1-0-0-0 performance at 20 MHz and 


2-0-0-0 performance at 25 MHz. Full details can be 


found in the 27960KX data sheet (No. 290237) 
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.001 (0.025) R — 
MIN TYP 


iD Ae dd ad Ae a dd dA ddd 
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Figure 20. The 132-Lead Plastic Quad Flat Pack (PQFP) used to Package the 80960KA 
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D2 
D fe BASE PLANE 
D1 
: fA 


li 


a | C C-ISEATING PLANE 
mm (inch) — . ~ [EXJ0.10 (0.004) 
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3.81 (0.150) MAX TYP 


DETAIL M 


E2 
1.32 (0.052) | 
1.22 (0.048) | 
~ 0.090 (0.035) MIN ee ioo7as 
2.03 (0.080 — 
1.93 (0.076 


«a«a——— 9? 


SEE DETAIL M 


LAAN NO 


1.91 (0.075) MAX TYP 


mm (inch) . 
3 270775-22 


Figure 21b. Details of the Molding of the 132-Lead PQFP 
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0.635 (0.025) 


SEE DETAIL L 


SEE DETAIL J 


mm (inch) DETAIL J DETAIL L 
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Figure 21c. Terminal Details for the 132-Lead PQFP 
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Figure 21d. Board Footprint Area for the 132-Lead PQFP 
Table 7. Package Dimension: 80960KA PQFP 


Leadcount 


Package Height 


A 
[oe | Terminal oimension 


Bumper Distance | 
Without Flash 
With Flash 


| 0363 | _LeadDimension 20.32 R 
D4,E4 Foot Radius Location 1.037 25.890 26.330 


Foot Length 0.510 0.760 
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¢ Low insertion force (LIF) soldertail 
55274-1 

e Amp tests indicate 50% reduction in 
insertion force compared to 
machined sockets 

Other socket options 

® Zero insertion force (ZIF) soldertail 
55583-1 

© Zero insertion force (ZIF) Burn-in 
version 55573-2 

Amp Incorporated 
(Harrisburg, PA 17105 U.S.A. 
Phone 717-564-0100) 


270775-25 


Cam handle locks in low profile position when 80960KA is installed 
(handle UP for open and DOWN for closed positions). 


Courtesy Amp Incorporated 


Peel-A-Way* Mylar and Kapton Peel-A-Way Carrier No. 132: 


Socket Terminal Carriers Kapton Carrier is KS132 
© Low insertion force surface Mylar Carrier is MS132 

mount CS132-371G Molded Plastic Body KS132_ 
© Low insertion force soldertail is shown below: 

CS132-01TG 


© Low insertion force wire-wrap 


CS132-02TG (two-level) 
CS132-03TG (thee-level) 


© Low insertion force press-fit 
CS132-05TG 


Advanced Interconnections retttetettttt: 
(5 Division Street) “| .100 TYP 
Warwick, RI 02818 U.S.A. 14x 14x 3ROWS 
Phone 401-885-0485) 270775-26 


a 270775-27 
~ Courtesy Advanced Interconnections 
(Peel-A-Way Terminal Carriers 
U.S. Patent No. 4442938) 


*Peel-A-Way is a trademark of Advanced interconnections. 


Figure 22. Several Socket Options for Mounting the 80960KA 


3-63 


80960KA 


Figure 23a. AMP Micropitch Socket for the 132-Lead Plastic 
. Quad Flat Pack, 0.025” Lead Spacing, Gull Wing Leads 


3-64 


270775-28 


intel. B0960KA 


Part Number 
—  2-0132- 07244. 000- 01807 . 
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Figure 23b. 3M Company PQFP Socket and Lid 
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Figure 24. 80960KA PGA Pinout—View from Bottom (Pins Facing Up) 
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Figure 25. 80960KA PGA Pinout—View from Top (Pins Facing Down) 
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Table 5. 80960KA PGA Pinout—In Pin Order 
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Table 6. 80960KA PGA Pinout—!In Signal Order 
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MEASURE PGA CASE TEMPERATURE _ | MEASURE PQFP TEMPERATURE AT 
AT CENTER OF TOP SURFACE CENTER OF TOP SURFACE 


270775-32 
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Figure 26. Measuring 80960KA PGA and PQFP Case Temperature 


TEMPERATURE (°C) 
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Figure 27. 10 MHz 80960 K-Series Maximum Allowable Ambient Temperature 


TEMPERATURE (°C) 
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Figure 28. 16 MHz 80960 K-Series Maximum Allowable Ambient Temperature | 
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Figure 29. 20 MHz 80960 K-Series Maximum Allowable Ambient Temperature 


AMBIENT TEMPERATURE (°C) 


AIRFLOW (ft/min) 


fa PGA with no OPGA with omni=- @PGA with uni= 
heatsink directional heatsink directional heatsink 
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Figure 30. Maximum Allowable Ambient Temperature for 
the 80960KA at 25 MHz (available in PGA only) 
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Figure 31. Maximum Allowable Ambient Temperature for the Extended 
Temperature TA-80960KA at 20 MHz (available in PGA only) 
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INT3/INTA 
INT2/INTR 


Vss 
NC 


99 98 97 96 95 94 93 92 91 90 89 88 86 85 84 83 82 81 80.79 78 77 76 75 74 73 72 71:70 69 68 67 
LADO 100 , 
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LAD14 116 | | 80960KA 
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Figure 32. 80960KA PQFP Pinout—View from Top 
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Table 8. 80960KA Plastic Package Pinout—In Pin Order 
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Table 9. 80960KA Plastic Package Pinout—In Signal Order 
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Table 10. 80960KA PGA Package Thermal Characteristics 


Thermal — <2 > le 
Airflow—ft./min | Airflow—ft./min (m/sec) 
Parameter 100 | 200 | 400 | 600 | 800 
(0.50) | (1.01) | (2.03) | (3.04) | (4.06) 

@ Junction-to-Case 
(Case Measured 2 2 2 2. | 2 2 2 
as shown in Figure 26) 
6 Case-to-Ambient 
(No Heatsink) [ro] 10 | 7 | 15 | 2 | 10 | 9 


8 Case-to-Ambient 
(with Omnidirectional 
Heatsink) 


6 Case-to-Ambient 
(with Unidirectional) 
Heatsink) 


NOTES: 

1. This table applies to 80960KA PGA 3. 0).cap = 4°C/w (approx.) 

plugged into socket or soldered di- 0)-pin = 4°C/w (inner pins) (approx.) 
rectly into board. 6)-pin = 8°C/w (outer pins) (approx.) 
2. Oya = Oyo + Oca. 


Table 11. 80960KA PQFP Package Thermal Characteristics 


PQFP Thermal Resistance—°C/Watt 
Airflow—ft./min (m/sec) 
Parameter 50 | 100 400 | 600 | 800 
| a (0.25) | (0.50) (2.03) | (3.04) | (4.06) 
6 Junction-to-Case 
(Case Measured 
as shown in Figure 26) 
@ Case-to-Ambient . 
(No Heatsink) ze] fre lw] |e |e 


NOTES: 


1. This table applies to 80960KA 3.6), = 18°C/Watt 
PQFP soldered directly into board. 0j8 =. 18°C/Watt 
2. Oya = O5c + Cap. - 
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Figure 33. Read Transaction 
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Figure 34. Write Transaction with One Wait State 
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Figure 35. Burst Read Transaction © 
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Figure 36. Burst Write Transaction with One Wait State 
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INTR can go low no sooner than 5 ns (input hold time) following the beginning of interrupt acknowledgement cycle 1. 
For a second interrupt to be acknowledged, INTR must be low for at least three cycles before it can be reasserted. 


Figure 37. Interrupt Acknowledge Transaction 
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HLDAR 


Figure 38. Bus Exchange Transaciion (PBM = Primary Bus Master, SBM = Secondary Bus Master) 
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80960KB 
EMBEDDED 32-BIT PROCESSOR 
WITH INTEGRATED FLOATING-POINT UNIT 


High-Performance Embedded 2 Multiple Register Sets 
Architecture | — Sixteen Global 32-Bit Registers 
— 25 MIPS Burst Execution at 25 MHz — Sixteen Local 32-Bit Registers 
— 9.4 MIPS* Sustained Execution at — Four Local Register Sets Stored 
25 MHz On-Chip 
On-Chip Floating-Point Unit — Register Scoreboarding 
— Supports IEEE 754 Standard @ Built-in Interrupt Controller 
— Four 80-Bit Registers | — 32 Priority Levels 256 Vectors 
— 5.2 Million Whetstones/s at — 3.4 ws Latency 
25 MHz m Easy to Use, High Bandwidth 32-Bit Bus 
512-Byte On-Chip Instruction Cache — 66.7 Mbytes/s Burst | 
— Direct Mapped — Up to 16-Bytes Transferred per Burst 
— Parallel Load/Decode for Uncached Uses 85C960 Bus Controller 
Instructions : | 


- 4 Gigabyte, Linear Address Space 
132-Lead PGA and PQFP Packages 


— Supported by 27960KX Burst EPROMs 


The 80960KB is the first member of Intel’s new 32-bit processor family, the i960 series, which is designed 
especially for embedded applications. It is based on the family’s high performance, common core architecture, 
and includes a 512-byte instruction cache, a built-in interrupt controller, and an integrated floating-point unit. 
The 80960KB has a large register set, multiple parallel execution units and a high-bandwidth, burst bus. Using 
advanced RISC technology, this high performance processor is capable of execution rates in excess of 9.4 
million instructions per second.* The 80960KB is well-suited for a wide range of embedded applications, 
including laser printers, image processing, industrial control, robotics and telecommunications. | 


*Relative to Digital Equipment Corporation’s VAX-11/780** at 1 MIPS 


4 80~8IT 64 by 32=BIT 
FP LOCAL 
REGISTERS 16 32-BIT 32=BIT 
Ere: REGISTER 
GLOBAL BACHE [EU 
, REGISTERS | 
80-BIT ; we 
FPU : 


BUS 
CONTROL 
LOGIC. id 


AND 
INTERRUPT 
CONTROLLER 


512=BYTE ed 

INSTRUCTION INSTRUCTION MICRO= MICRO 
CACHE | INSTRUCTION K€ INSTRUCTION 

— SEQUENCER ROM 


INSTRUCTION 
FETCH UNIT DECODER 


270565-1 
Figure 1. The 80960KB’s Highly Parallel Microarchitecture — pa 


**VAX-11T is a trademark of Digital Equipment Corporation. | 


September 1991 
3-81 Order Number: 270565-006 
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THE 960 SERIES 


The 80960KB is the first member of a new family of. 
32-bit microprocessors from Intel known as the 960. 


Series. This series was especially designed to serve 
the needs of embedded applications. The embed- 
ded market includes applications as diverse as in- 
dustrial automation, avionics, image processing, 
graphics, robotics, telecommunications and automo- 
biles. These types of applications require high 
integration, low power consumption, quick interrupt 
response times and high performance. Since time to 
market is critical, embedded microprocessors need 
to be easy to use in both hardware and software 
designs. , | | | 


GLOBAL 
REGISTERS(1) 


SIXTEEN 
32-BIT 
REGISTERS 


FOUR 80-BIT REGISTERS ~ 


SIXTEEN — 
32-BIT 
REGISTERS 


LOCAL — 
REGISTERS(@) 


32-BITS_ 
32-BITS 
32-BITS _ 


32-BITS | TRACECONTROLS 


NOTES: 


1. Register g15 is reserved for stack management functions. 


ARITHMETIC CONTROLS - 


INSTRUCTION POINTER 


PROCESS CONTROLS: | 
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a All members of the 80960 series share a common 
core architecture which utilizes RISC technology so 


that, except for special functions, the family mem- 
bers are object code compatible. Each new proces- 
sor in the series will add its own special set of func- 
tions to the core to satisfy the needs of a specific 
application or range of applications in the embedded 
market. For example, future processors may include 
a DMA controller, a timer or an A/D converter. 


The 80960KB includes an integrated floating-point 
unit. Intel also offers a pin-compatible version, called 
the 80960KA, without an FPU, and a military-grade 
version, the 80960MC, with support for memory 
management, mutitasking, multiprocessing and fault 
tolerance. | : 


FLOATING- 
POINT = 
REGISTERS 


ADDRESS 
SPACE 


2. Registers r0, r1, and r2 are reserved for stack management functions. 


Figure 2. Register Set 
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KEY PERFORMANCE FEATURES 


The 80960KB’s architecture is based on the most 
recent advances in RISC technology and is ground- 
ed in Intel’s long experience in designing embedded 
controllers. Many features contribute to the 
80960KB’s exceptional performance: 


1. Large Register Set. Having a large number of 
registers reduces the number of times that a proces- 
sor needs to access memory. Modern compilers can 
take advantage of this feature to optimize execution 
speed. For maximum flexibility, the 80960KB pro- 
vides 32 32-bit registers and four 80-bit floating- 
point registers. (See Figure '2.) 


2. Fast Instruction Execution. Simple functions 
make up the bulk of instructions in most programs, 


Control 


Compare 
and Branch 


Register 
to Register 


Memory 
Access—Short 


Memory 
Access—Long 


80960KB 


so that execution speed can be greatly improved by 
ensuring that these core instructions execute in as 
short a time as possible. The most-frequently exe- 
cuted instructions such as register-register moves, 
add/subtract, logical operations, and shifts execute 
in one to two cycles (Table 1 contains a list of in- 


structions.) 


3. Load/Store Architecture. Like other processors 
based on RISC technology, the 80960KB has a - 
Load/Store architecture, only the LOAD and STORE 
instructions reference memory; all other instructions 
operate on registers. This type of architecture simpli- 
fies instruction decoding and is used in combination 
with other techniques to increase parallelism. 


Displacement 


Reg/Lit 


Figure 3. Instruction Formats 
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Table 1. 80960KB Instruction Set 


Bit and Bit 


Load | Add : Set Bit 
Store Subtract : Clear Bit 
Move Multiply a Not Bit 
Load Address _ Divide 7 ; Check Bit 
Remainder Exclusive Or Alter Bit 
Modulo _ Not Or | Scan for Bit 


Shift Or Not : Scan over Bit 
Extended Multiply Nor : Extract 
Extended Divide Exclusive Nor Modify 
| = Not | 
Nand 
Rotate 


"aa A” a Ce" 


Compars Unconditional Call | Conditional Fault 
Conditional | Branch Call Extended | Synchronize Faults 
Compare Conditional Branch ~ Call System | 
Compare and © Compare and — Return | 
Increment Branch Branch and Link 
Compare and | | 7 a 
| Decrement > | _— 7 | 
| Debug |~—sCMiisccellaneous | Decimal_ 
Modify Trace + Atomic Add _ | Move 
Controls - Atomic Modify Add with Carry | 
Mark y Flush Local Registers Subtract with Carry 
Force Mark | Modify Arithmetic | 
Controls 
Modify Process Controls © 
Scan Byte for Equal 
Test Condition Code 


Convert Real to Integer Move Real Synchronous Load 
_ Convert Integer to Real Add : | Synchronous Move 
a Subtract 
Multiply 
Divide 
Remainder 
Scale 
Round | 
Square Root 
Sine 
Cosine 
Tangent 
Arctangent 
Log 
Log Binary. 
Log Natural 
Exponent 
Classify 
Copy Real Extended 
Compare 
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4. Simple Instruction Formais. All instructions in 
the 80960KB are 32-bits long and must be aligned 
on word boundaries. This alignment makes it possi- 
ble to eliminate the instruction-alignment stage in 
the pipeline. To simplify the instruction decoder fur- 
ther, there are only five instruction formats and each 
instruction uses only one format. (See Figure 3.) 


5. Overlapped Instruction Execution. A load oper- 
ation allows execution of subsequent instructions to 
continue before the data has been returned from 
memory, so that these instructions can overlap the 
load. The 80960KB manages this process transpar- 
ently to software through the use of a register score- 
board. Conditional instructions also make use of a 
scoreboard so that subsequent unrelated instruc- 
tions can be executed while the conditional instruc- 
tion is pending. 


6. Integer Execution Optimization. When the re- 
sult of an operation is used as an operand in a sub- 
sequent calculation, the value is sent immediately to 
its destination register. Yet at the same time, the 
value is put back on a bypass path to the ALU, 
thereby saving the time that otherwise would be re- 
quired to retrieve the value for the next operation. 


7. Bandwidth Optimizations. The 80960KB gets 
optimal use of its memory bus bandwidth because 
the bus is tuned for use with the cache: the line size 
of the instruction cache matches the maximum burst 
size for instruction fetches. The 80960KB automati- 
cally fetches four words in a burst and stores them 
directly in the cache. Due to the size of the cache 
and the fact that it is continually filled in anticipation 
of needed instructions in the program flow, the 
80960KB is exceptionally insensitive to memory wait 
states. In fact, each wait state causes only a 7% 
degradation in system perfomance. The benefit is 
that the 80960KB will deliver outstanding perform- 
ance even with a low cost memory system. 


8. Cache Bypass. If there is a cache miss, the proc- 
essor fetches the needed instruction, then sends it 
on to the instruction decoder at the same time it 
updates the cache. Thus, no extra time is taken to 
load and read the cache. 


Memory Space and Addressing Modes 


The 80960KB offers a linear programming environ- 
_ ment so that all programs running on the processor 
are contained in.a single address space. The maxi- 
mum size of the address space is 4 Gigabytes (292 
bytes). 


For ease of use, the 80960KB has a small number of 
addressing modes, but includes all those necessary 
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to ensure efficient compiler implementations of high- 
level languages such as C, Fortran and Ada. Table 2 
lists the memory addressing modes. 


Data Types 
The 80960KB recognizes the following data types: 


Numeric: 

® 8-, 16-, 32- and 64-bit ordinals 

© 8-, 16, 32- and 64-bit integers 

© 32-, 64- and 80-bit real numbers 


Non-Numeric: 

© Bit 

°° Bit Field 

° Triple-Word (96 bits) 
© Quad-Word (128 bits) 


Large Register Set 


The programming environment of the 80960KB in- 
cludes a large number of registers. In fact, 36 regis- 
ters are available at any time. The availability of this 
many registers greatly reduces the number of mem- 
ory accesses required to execute most programs, 
which leads to greater instruction processing speed. 


There are two types of general-purpose registers: 
local and global. The 20 global registers consist of 
sixteen 32-bit registers (GO through G15) and four 
80-bit registers (FPO through FP3). These registers 
perform the same function as the general-purpose 
registers provided in other popular microprocessors. 
The term global refers to the fact that these regis- 
ters retain their contents across procedure calls. 


The local registers, on the other hand, are proce- 
dure specific. For each procedure call, the 80960KB 
allocates 16 local registers (RO through R15). Each 
local register is 32 bits wide. Any register can also 
be used for single or double-precision floating-point 
operations; the 80-bit floating-point registers are pro- 
vided for extended precision. 


Multiple Register Sets 


To further increase the efficiency of the register set, 
multiple sets of local registers are stored on-chip. 
This cache holds up to four local register frames, 
which means that up to three procedure calls can be 
made without having to access the procedure stack 
resident in memory. 


Although programs may have procedure calls nest- 
ed many calls deep, a program typically oscillates 
back and forth between only two or three levels. As 
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Table 2. Memory Addressing Modes 


12-Bit Offset 

32-Bit Offset 
Register-Indirect 
Register + 12-Bit Offset 


Register + 32-Bit Offset 
Register + (Index-Register <x Scale-Factor) 


Register <x Scale Factor + 32-Bit Displacement 
Register +. (Index-Register x Scale-Factor) + 32-Bit Displacement 


Scale-Factor is 1, 2, 4, 8. or 16 


a result, with four stack frames in the cache, the 
probability of there being a free frame on the cache 
when a call is made is very high. In fact, runs of 
representative C-language programs show that 80% 
_. of the calls are handled without needing to access 
“memory. | 


If there are four or more active procedures and a 
new procedure is called, the processor moves the 
oldest set of local registers in the register cache to a 


REGISTER 


ONE OF FOUR 
LOCAL 
REGISTER SETS 


procedure stack in memory to make room for a new 
set of registers. Global register G15 is used by the 
processor as the frame pointer (FP) for the proce- 
dure stack. : : 
Note that the global and floating-point registers are 
not exchanged on a procedure call, but retain their 
contents, making them available to all procedures 
for fast parameter passing. An illustration of the reg- 
ister cache is shown in Figure 4. 


270565-2 
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Instruction Cache 


To further reduce memory accesses, the 80960KB 
includes a 512-byte on-chip instruction cache. The 
instruction cache is based on the concept of locality 
of reference; that is, most programs are not usually 
executed in a steady stream but consist of many 
branches and loops that lead to jumping back and 
forth within the same small section of code. Thus, by 
maintaining a block of instructions in a cache, the 
number of memory references required to read in- 
structions into the processor can be greatly reduced. 


To load the instruction cache, instructions are 
fetched in 16-byte blocks, so that up to four instruc- 
tions can be fetched at one time. An efficient 
prefetch algorithm increases the probability that an 
instruction will already be in the cache when it is 
needed. 


Code for small loops will often fit entirely within the 
cache, leading to a great increase in processing 
speed since further memory references might not be 
necessary until the program exits the loop. Similarly, 
when calling short procedures, the code for the call- 
ing procedure is likely to remain in the cache, so it 
will be there on the procedure’s return. 


Register Scoreboarding 


The instruction decoder has been optimized in sev- 
eral ways. One of these optimizations is the ability to 
do instruction overlapping by means of register 
scoreboarding. 


Register sedieboardiag: occurs when a LOAD in- 
struction is executed to move a variable from memo- 
ry into a register. When the instruction is initiated, a 
scoreboard bit on the target register is set. When the 
register is actually loaded, the bit is reset. In be- 
tween, any reference to the register contents is ac- 
companied by a test of the scoreboard bit to insure 
that the load has completed before processing con- 
tinues. Since the processor does not have to wait for 
the LOAD to be completed, it can go on to execute 
additional instructions placed in between the LOAD 
instruction and the instruction that uses the register 
contents, as shown in the following example: 


LOAD R4, address 1 
LOAD R85, address 2 
Unrelated instruction. 
Unrelated instruction 
ADD R4, R5, R6 
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In essence, the two unrelated instructions between 
the LOAD and ADD instructions are executed for 
free (i.e., take no apparent time to execute) because 
they are executed while the register is being loaded. 
Up to three LOAD instructions can be pending at 
one time with three corresponding scoreboard bits 
set. By exploiting this feature, system programmers 
and compilers have a useful tool for optimizing exe- 
cution speed. 


Floating-Point Arithmetic 


In the 80960KB, floating-point arithmetic has been 
made an integral part of the architecture. Having the 
floating-point unit integrated on-chip provides two 
advantages. First, it improves the performance of 
the chip for floating-point applications, since no 
additional bus overhead is associated with floating- 
point calculations, thereby leaving more time for oth- 
er bus operations such as |/O. Second, the cost of 
using floating-point operations is reduced because a 

separate coprocessor chip is not required. : 


The 80960KB floating-point (real number) data types = 


include single-precision (32-bit), double-precision 
(64-bit), and extended precision (80-bit) floating- 
point numbers. Any register may be used to execute 
floating-point operations. 


The processor provides hardware support for both 
mandatory and recommended portions of IEEE 
Standard 754 for floating-point arithmetic, including 
all arithmetic, exponential, logarithmic, and other 
transcendental functions. Table 3 shows execution 
times for some representative instructions. _ 


Table 3. Sample Floating-Point Execution 
Times (ws) at 25 MHz 


Add 
Subtract 
Multiply 
Divide 


Square Root 
Arctangent 
Exponent 
Sine 

Cosine 
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High Bandwidth Local Bus 


An 80960KB CPU resides on a high-bandwidth ad- 
dress/data bus known as the local bus (L-Bus). The 
L-Bus provides a direct communication path be- 
tween the processor and the memory and I/O sub- 
system interfaces. The processor uses the local bus 
to fetch instructions, manipulate memory, and re- 
spond to interrupts. Its features include: é 


e 32-bit multiplexed address/data path 


¢ Four-word burst capability, which allows transfers 
from 1 to 16 bytes at a time 


e High bandwidth fends and writes at 66.7 Mbytes 
per second 


e Special signal to indicate whether a memory 
transaction can be cached 


Figure 5 identifies the groups of signals which con- 
stitute the L-Bus. Table 4 lists the function of the L- 
Bus and other processor-support signals, such as 
the interrupt lines. 


Interrupt Handling 


The 80960KB can be interrupted in one of two ways: 
by the activation of one of four interrupt pins or by 
sending a message on the processor’s data bus. 


The 80960KB is unusual in that it automatically han- 
dies interrupts on a priority basis and tracks pending 
interrupts through its on-chip interrupt controller. 
Two of the interrupt pins can be’ configured to pro- 
vide 8259A handshaking for expansion beyond four 
interrupt lines. | 
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| Debug Features 


The 80960KB has built-in debug capabilities. There 
are two types of breakpoints and six different trace 
modes. The debug features are controlled by two 
internal 32-bit registers, the Process-Controls Word 
and the Trace-Controls Word. By setting bits in 
these control words, a software debug monitor can 
closely control how the processor responds eurlhg 
program execution. 


The 80960KB has both hardware and software 
breakpoints. It provides two hardware breakpoint 
registers on-chip which can be set by a special com- 
mand to any value. When the instruction pointer 
matches the value in one of the breakpoint registers, 
the breakpoint will fire, and a breakpoint nanaing 
routine is called automatically. : 


The 80960KB also provides software BreaKpoinis 
through the use of two instructions, MARK and 
FMARK. These instructions can be placed at any 
point in a program and will cause the processor to 
halt execution at that point and call the breakpoint 
handling routine. The breakpoint mechanism is easy 
to use and provides a powerful debugging tool. 


Tracing is available for instructions (single-step exe- 
cution), calls and returns, and branching. Each dif- 
ferent type of trace may be enabled separately by a 
special debug instruction. In each case, the 
80960KB executes the instruction first and then 
calls a trace handling routine (usually part of a soft- 
ware debug monitor). Further program execution is 
halted until the trace routine is completed. When the 
trace event handling routine is completed, instruc- 
tion execution resumes at the next instruction. The 


LOCAL BUS SIGNAL GROUPS 


\ 


ADDRESS/DATA (32 LINES) 


CONTROL (ADDRESS,DATA, and OPERATION SIGNALS = 15 LINES) | 


_ ARBITRATION (2 LINES) 


. 270565-3 


Figure 5. Local Bus Signal Groups 
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80960KB’s tracing mechanisms, which are imple- 
mented completely in hardware, greatly simplify the 
task of testing and debugging software. 


FAULT DETECTION 


The 80960KB has an automatic mechanism to 
handle faults. There are ten fault types including 


trace, arithmetic, and floating-point faults. When the _ . 


processor detects a fault, it automatically calls the 
appropriate fault handling routine and saves the cur- 
rent instruction pointer and necessary state informa- 
tion to make efficient recovery possible. The proces- 
sor posts diagnostic information on the type of fault 
to a Fault Record. Like interrupt handling. routines, 
fault handling routines are usually written to meet 
the needs of a specific application and are often in- 
cluded as part of the operating system or kernel. 


For each of the ten fault types, there are numerous 
subtypes that provide specific information about a 
fault. For example, a floating-point fault may have its 


subtype set to an Overflow or Zero-Divide fault. The - 


fault handler can use this specific information to re- 
spond correctly to the fault. 


BUILT-IN TESTABILITY 


Upon reset, the 80960KB automatically conducts an 
extensive internal test (self-test) of its major blocks 
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of logic. Then, before executing its first instruction, it 
does a zero check sum on the first eight words in 
memory to ensure that the system has been loaded 
correctly. If a problem is discovered at any point dur- 
ing the self-test, the 80960KB will assert its FAIL- 
URE pin and will not begin program execution. The 
self-test takes approximately 47,000 cycles to com- 
plete. 


System manufacturers can use the 80960KB’s self- 
test feature during incoming parts inspection. No 
special diagnostic programs need to be written, and 
the test is both thorough and fast. The self-test ca- 
pability helps ensure that defective parts will be dis- 
covered before systems are shipped, and once in 
the field, the self-test makes it easier to distinguish 
between problems caused by processor failure and 
problems resulting from other causes. 


CHIMOS 


The 80960KB is fabricated using Intel’s CHMOS IV ig) 
- (Complementary High Speed Metal Oxide Semicon- 


ductor) process. This advanced technology elimi- 
nates the frequency and reliability limitations of older 
CMOS processes and opens a new era in micro- 
processor performance. It combines the high per- 


‘formance capabilities of Intel’s industry-leading 
~HMOS technology with the high density and low 


power characteristics of CMOS. The 80960KB is 
available at 10, 16, 20 and 25 MHz. 


Table 4a. 80960KB Pin Description: L-Bus Signals 


Symbol Name and Function —_ | | 


aid 


SYSTEM CLOCK provides the fundamental timing for 80960KB systems. Itis | 
divided by two inside the 80960KB to generate the internal processor clock. 


LOCAL ADDRESS/DATA BUS carries 32-bit physical addresses and data to and 

_ from memory. During an address (Ta) cycle, bits 2-31 contain a physical word 
address (bits 0-1 indicate SIZE; see below). During a data (Tq) cycle, bits 0-31 
contain read or write data. The LAD lines are active HIGH and float to a high 
impedance state when not active. 


SIZE, which is comprised of bits 0-1 of the LAD lines during a Ta cycle, specifies 
the size of a burst transfer in words. | 


LAD; 


LAD 9 


1 Word 

2 Words 
3 Words . 
4 Words 


ADDRESS-LATCH ENABLE indicates the transfer of a physical address. ALE is 
asserted during a Tg cycle and deasserted before the beginning of the Tg state. It 
is active LOW and floats to a high impedance state during a hold cycle (Tp, or Thy). 


I/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = tri-state 
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Table 4a. 80960KB Pin Description: L-Bus panel (Continued) — 


ADDRESS/DATA STATUS indicates an address state. ADS is asserted every Tg 
state and deasserted during the the following Tg state. For a burst transaction, 
ADS is asserted again every Tg state where READY was asserted in the previous © 
cycle. 


WRITE/READ specifies, aunhian a Ta cycle, whether the BOnaialicn isa write c or 
read. It is latched on-chip and remains valid during Tg cycles. 


DATA TRANSMIT/RECEIVE indicates the direction of data transfer.to and from 
the L-Bus. It is low during Ta and Tg cycles for a read or interrupt _ | 
acknowledgement; it is is high during Ta and Ty cycles for a write. DT/R never 
changes state when DEN is asserted (see ae ee 


READY indicates that data on LAD lines can be sampled or removed. If READY is 
not asserted during a Tg cycle, the Tg cycle is extended to the next cycle by 
inserting a wait state (Tw), and ADS is not asserted in the next cycle. . 


BUS LOCK prevents other bus masters from gai ning control of the L-Bus | 
following the current cycle (if they would assert LOCK to do so). LOCK is used by © 

- the processor or any bus agent when it performs indivisible Read/Modify/Write 
(RMW) operations. Do not leave LOCK unconnected. It must be pulled high for the . 
processor to function properly. 


For a read that is designated as a RMW-read, LOCK is examined. if asserted, the 
processor waits until it is not asserted; if not asserted, the processor asserts 
LOCK during the Tg cycle and leaves it asserted. 


A write that is designated as an RMW-write deasserts LOCK i in ‘the T cycle. 
During the time LOCK is asserted, a bus agent can perform a normal read or write 
but no RMW operations. LOCK is also held asserted during an nee 
acknowledge transaction. 


BYTE ENABLE LINES specify which data bytes fie to four) on the bus take part | 
in the current bus cycle. BE3 corresponds to ANDaNE anes and BEo corresponds 
to LAD7- LADo. | 


The byte enables are provided in advance of data. The byte enables asserted 
during Ta specify the bytes of the first data word. The byte enables asserted 

during Tg specify the bytes of the next data word (if any), that is, the word to be 
transmitted following the next assertion of READY. The byte enables during the 

Tg cycles preceding the last assertion of READY are undefined. The byte enables 
are latched on-chip and remain constant from one Tq cycle to the next when 
READY is not asserted. 


For reads, the byte enables sneciny the byte(s) that the processor will actually use. 
L-Bus agents are required to assert only adjacent byte enables (e.g., asserting just 
BEo and BE» is not permitted), and are required to assert at least one byte enable. 
To produce address bits Ag and A, externally, they can be decoded from the byte 
enables. 


I/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = tri- state 
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Table 4a. 80960KB Pin Description: L-Bus Signals (Continued) 


Symbol Name and Function 


HOLD/ al HOLD: If the processor is the primary bus master (PBM), the input is interpreted 
HLDAR 


as HOLD, a request from a secondary bus master to acquire the bus. When the 
processor receives HOLD and grants another.master control of the bus, it floats 
its tri-state bus lines and then asserts HLDA and enters the Ty, state. When HOLD 
is deasserted, the processor will deassert HLDA and go to either the Tj or Tg 
state. 
HOLD ACKNOWLEDGE RECEIVED: If the processor is a secondary bus master 
(SBM), the input is HLDAR, which indicates, when HOLDR output is high, that the 
processor has acquired the bus. Processors and other agents can be told at reset 
HLDA/ O 
HOLDR T.S. 
CACHE O 
ar, aot 
Table 4b. 80960KB Pin Description: Module Support Signals 
Symbol Name and Function 
BAD ACCESS, if asserted in the cycle following the one in which the last READY 
of a transaction is asserted, indicates that an unrecoverable error has occurredon |. 
the current bus transaction, or that a synchronous load/store instruction has not 
— been acknowledged. 
STARTUP: During system reset, the BADAC signal is interpreted differently. If the 


if they are the primary bus master (PBM). 

HOLD ACKNOWLEDGE: If the processor is a primary bus master, the output is. 
signal is high, it indicates that this processor will perform system initialization. If it - 
is low, another processor in the system will perform system initialization instead. 


HLDA, which relinquishes control of the bus to another bus master. 


HOLD REQUEST: For secondary bus masters (SBM), the output is HOLDR, which 
is a request to acquire the bus. The bus is said to be acquired if the agentis a 
primary bus master and does not have its HLDA output asserted, or if the agent is 
a secondary bus master and has its HOLD input and HLDA output asserted. 


CACHE indicates if an access is cacheable during a Tg cycle. It is not asserted 
during any synchronous access, such as a synchronous load or move instruction - 
used for sending an IAC message. The CACHE signal floats to a high impedance 
state when the processor is idle. 


RESET clears the internal logic of the processor and causes it to re-initialize. 


During RESET assertion, the input pins are ignored (except for BADAC and 
IAC/INTo), the tri-state output pins are placed in a high impedance state, and 
other output pins are placed in their non-asserted state. 


RESET must be asserted for at least 41 CLK2 cycles for a predictable RESET. 
The HIGH to LOW transition of RESET should occur after the rising edge of both 
CLK2 and the external bus CLK, and before the next rising edge of CLK2. 


INITIALIZATION FAILURE indicates that the processor has failed to initialize 
correctly. After RESET is deasserted and before the first bus transaction begins, 
FAILURE is asserted while the processor performs a self-test. If the self-test 

. completes successfully, then FAILURE is deasserted. Next, the processor 

performs a zero checksum on the first eight words of memory. If it fails, FAILURE 

is asserted for a second time and remains asserted; if it passes, system 
initialization continues and FAILURE remains deasserted. 


FAILURE 


NOT CONNECTED indicates pins should not be connected. Never connect any 
pin marked N.C. 


/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = tri-state 
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es 80960KB Pin Description: Module Support Signals (Continued) 


INTERAGENT COMMUNICATION REQUEST/INTERRUPT 0 indicates either 
that there is a pending [AC message for the processor or an interrupt. The bus 
interrupt control register determines in which way the signal should be interpreted. 
To signal an interrupt or IAC request in a synchronous system, this pin (as well as 
‘the other interrupt pins) must be enabled by being deasserted for at least one bus 
cycle and then asserted for at least one additional bus cycle; in an asynchronous 
system, the pin must remain deasserted for at least two bus cycles and then be 
asserted for at least two more bus cycles. 


LOCAL PROCESSOR NUMBER: This signal is interpreted differently during 
system reset. If the signal is at a high voltage level, it indicates that this processor 
is a primary bus master (Local Processor Number = 0); if it is at a low voltage 
level, it indicates that this processor is a secondary bus master (Local Processor 


Number = 1). 


INTt | || INTERRUPT 1, like INTO, provides direct interrupt signaling. | 


interrupt controller. 


INTERRUPT 2/INTERRUPT REQUEST: The bus control registers determines 
how this pin is interpreted. If INT2, it has the same interpretation as the INTO and 
INT1 pins. If INTR, it is used to receive an interrupt quest from an external 


INTERRUPT 3/INTERRUPT ACKNOWLEDGE: The bus interrupt control register 
determines how this pin is interpreted. lf INT3, it has the same interpretation as 
the INTO, INT1, and INT2 pins. If INTA, it is used as an output to control interrupt- 
acknowledge bus transactions. The INTA output is latched on-chip and remains 
_ |. valid during Tg cycles; as an output, it is open-drain. | 
\/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = tri-state 


ELECTRICAL SPECIFICATIONS 


Power and Grounding 


The 80960KB is implemented in CHMOS IV technol- 
ogy and has modest power requirements. Its high 
clock frequency and numerous output buffers (ad- 
dress/data, control, error, and arbitration signals) 
can cause power surges as multiple output buffers 
drive new signal levels simultaneously. For clean on- 
chip power distribution at high frequency, 12 Vcc 
and 13 Vss pins separately feed functional units of 
the 80960KB in the PGA. 


Power Decoupling Recommendations 


Liberal decoupling capacitance should be placed 
near the 80960KB. The processor. can cause tran- 
sient power surges when driving the L-Bus, particu- 
larly when it is connected to a large capacitive load. 


Low inductance capacitors and interconnects are 
recommended for best high frequency electrical per- 
formance. Inductance can be reduced by shortening 


‘the board traces between the processor and de- 


Power and ground connections must be made to all 
power and ground pins of the 80960KB. On the cir- | 


cuit board, all Vcc pins must be strapped closely 
together, preferably on a power plane. Likewise, all 
Vss pins should be Strapped together, preferably on 
a ground plane. These pins may not be connected 
together within the chip. 
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coupling capacitors as much as possible. Capacitors 
specifically designed for PGA packages are also 
commercially available and offer the lowest possible 
inductance. 


Connection Recommendations 


For reliable operation, always connect unused in- 
puts to an appropriate signal level. In particular, if 
one or more interrupt lines are not used, they should 
be pulled up. No inputs should ever be left floating. 
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All open-drain outputs require a pullup device. While 
in some cases a simple pullup resistor will be ade- 
quate, we recommend a network of pullup and pull- 
down resistors biased to a valid Viy (23.4V) and 
terminated in the characteristic impedance of the cir- 
cuit board. Figure 6 shows our recommendations for 
the resistor values for both a low and high current 
drive network, which assumes that the circuit board 
has a characteristic impedance of 100. The advan- 
tage of terminating the output signals in this fashion 
is that it limits signal swing and reduces AC Pawer 
consumption. 


Characteristic Curves 


Figure 7 shows the typical supply current require- 
ments over the operating temperature range of the 
processor at supply voltage (Vcc) of 5V. Figure 8 
shows the typical power supply current (Icc) re- 
quired by the 80960KB at various operating frequen- 
~ cies when measured at three input voltage (Vcc) 
levels. 


For a given output current (Io,), the curve in Figure 9 
shows the worst case output low voltage (Vo). 


80960KB 
OPEN~DRAIN 
. OUTPUT 


270565-25 
Low Drive Network: 
©VoH = 3.42V 
® lo = 25.3 mA 
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Figure 10 shows the typical capacitive derating 
curve for the 80960KB measured from 1.5V on the 
system clock (CLK) to 1.5V on the falling edge and 
1.5V on the rising edge of the L-Bus address/data 
(LAD) signals. 


Test Load Circuit 


Figure 13 illustrates the load circuit used to test the 
80960KB’s tristate pins, and Figure 14 shows the 
load circuit used to test the open drain outputs. The 
open drain test uses an active load circuit in the form 
of a matched diode bridge. Since the open-drain out- 
puts sink current, only the lo, legs of the bridge are 
necessary and the loy legs are not used. When the 
80960KB driver under test is turned off, the output 
pin is pulled up to Vprer (i.e., Von). Diode Dy, is 
turned off and the Io, current source flows through 
diode Do. . 


When the 80960KB open-drain driver under test is iM 


on, diode Dj is also on, and the voltage on the pin 


being tested drops to Vo,. Diode Do turns off and jx¢ ummm 


lo. flows through diode Dj. 


80960KB 
OPEN-DRAIN 
OUTPUT 
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High Drive Network: 
© VoH = 3.41V 
® io. = 33.8 mA 


Figure 6. Connection Recommendations for Low and High Current Drive Networks 
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ABSOLUTE MAXIMUM RATINGS* 


Operating Temperature ........0°C to F 85°C Cases WARNING: Stressing the device beyond the “Absolute 
Storage Temperature .......... — 65°C to + 150°C Maximum Ratings” may cause permanent damage. 
Voltage on Any Pin.......... —0.5V to Voc + 0.5V These are stress ratings only. Operation beyond the 
Power Dissipation ................. 2.5W (25 MHz) nO eo aa oe 


tended exposure beyond the “Operating Conditions’’ 
may affect device reliability. 


DC CHARACTERISTICS 


PGA: 
80960KB (16 MHz): Tcase = 0°C to + 85°C, Voc = 5V + 10% 
80960KB (20 and 25 MHz): Tcase = O°C to + 85°C, Voc = 5V + 5% 


PQFP: 
B0960KA (10 and 16 MHz): Tcase = 0°C to + 100°C, Voc = 5V 410% 
80960KA (20 MHz): Tcase = O0°C to + 100°C, Voc = 5V +5% 


(1, 5) 
2,4) 


A - 
A. 
A 
A 
A 
F 
F 
F 


Max 
vt) . Le eel 
Vin . 
vou 7 
"Vou | GLK2input igh Voltage | 0.85Voc | Vootos | v | 
[Vor | Outputtowvottage | | os] | 
| Von a) 
ce 
cen eee 


Input Leakage Current — 


Test Conditions 
Pto Output Leakage Current : 


Power Supply Current: 
10 MHz 300° m 
16 MHz 375 m 
~ 20 MHz 420 m 
25 MHz | 480 m 
py rez fi 
| 4 


(2, 
0.45 <Vo<Vcc 


VIL 

ViH 

VoL 

VCH 

VOL 

loc 

lu 

Lo A | 

Gin pF | fg = 1 MHz@) 
Co relies 


Canis | 
, 
, 


/0 or Ouiput Capacitance | —=SS«dSSCitSS«dSipF | tg = 1 MHZ) 
Cox _| Clock Capacitance a ee ee 
NOTES: | - 
1. For tri-state outputs, this parameter is measured at: | , : —- 
AOGCSS/ Dalat i witatd ascend bs i ena e obi ewe okae el a aa inte eee Mee ae ak Fe ee eer ee a Ce 4.0mA 
GONnUOIS 520 wens Hn eee eee Ghd Sek ete wen Veda swine eee Uewe Ae ak Gee EA ea eed ane Balen noes 5.0 mA 
2.- This parameter is measured at: . . 
Address/Data...... dod Setgiawin ew LAbSG ES ae auaaen ite hiete Sed BSN h mar d Sealine a Aa ete dre SING eR A GR AAS —1.0 mA 
GOMUOIS 22.04.01 ott anata cen Saad an tgade e Gumus whe ne nie Owl aad 8S ba aema ae Mean pe Rerateg tae wma: .-0.9 mA 
1 Sl OR ee Gee eT CCI eRe eTeTIR Re Sere ONT RET RT ON Re eRe ge eae ae eR ee ane Ie ore erate Nee Ne ee ee re ene —5.0 mA 


3. Input, output, and clock capacitance are not tested. 
4. Not measured on open-drain outputs. 
5 FOr ODON-CiaiN OUIDURS 6h. s2s-0.a-u.aix aia vacaceuid asain acahavbanditontiin 8G baw aod aw eM Ma eke eal Meet Sawed eae enue sbees 25 mA 
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AC SPECIFICATIONS 


This section describes the AC specifications for the 
80960KB pins. All input and output timings are spec- 
ified relative to the 1.5V level of the rising edge. For 
output timings, the specifications refer to the time it 
takes the signal to reach 1.5V. 


OUTPUTS: 
LADz;~LADo, 
ADS, 


CACHE 


nel. | 80960KB 


For input timings, the specifications refer to the time 
at which the signal reaches (for input setup) or 
leaves (for hold time) the TTL levels of LOW (0.8V) 
or HIGH (2.0V). All AC testing should be done with 
input clock voltages of 0.4V and 2.4V, except for the 
clock (CLK2), which should be tested with input volt- 
ages of 0.45 Vcc and 0.55 Vcc. | 


‘TOCKINTA : Ts 


R,DEN Ny ee , BANS 7 
HLDA/HOLDR, SS iN 7 | ce 


Ty4 


; SCG ED NV 


INPUTS: | 


INTp/INTR, INT 


Te | Ti VALID INPUT 


Be... WW';»™WW~:: a 
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Figure 11. Drive Levels and Timing Relationships for 80960KB Signals 
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Figure 12. Timing Relationship of L-Bus Signals 
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AC Specification Tables a 
80960KB AC Characteristics (10 MHz, PQFP Only) 


VIL = 10% Point 
= 1.2V : 


Vin = 90% Point 
= 0.1V.+ 0.5 Vcc 


Vin = 90% Point to 10% 
Point . 


50 
12. 
12°. 
10 Vin = 10% Point to 90% 
| : Point 
Bi 25 C, = 100 pF (LAD) 
| , ..CL = 75 pF (Controls) (2) 
4 | CL a | 
25 
2 
4 


Output Float 20 C. = 100 pF (LAD) 
Delay a C, = 75 pF (Controls) 


HOLDA Output 
- Float Delay 


7 
ae 
re caged 


| 143 Setup to ALE 10 Ci = 100 pF (LAD) 
Inactive CL = 75 pF (Controls) 
Hold after ALE C_ = 100 pF (LAD) 
Inactive , C_ = 75 pF (Controls) 


41 CLK2 Periods Minimum 


CL = 75 pF 


NOTES: = 

1. IAC/INTo, INT1, INTo/INTR, INT3 can be asynchronous. 

2. A float condition occurs when the maximum output current becomes less than Io. Float delay is not tested, but should be 
no longer than the valid delay. 

3. Clock rise and fall times are not tested. 
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80960KB AC Characteristics (16 MHz) 


Ty Processor Clock 

1 Period (CLK2) 
Processor Clock 
Low Time (CLK2) , 


Test Conditions 


Vin = 1.5V 


31.25 125 


Vit = 10% Point 


= 1.2V 
Processor Clock Vin = 90% Point 
High Time (CLK2) = 0.1V+ 0.5Vcc 
Processor Clock 1 S Vin = 90% Point to 10% 
Fall Time (CLK2) Point 
Processor Clock Vin = 10% Point to 90% 
Rise Time (CLK2) Point 


nN 


C, = 100 pF (LAD) 

C. = 75 pF (Controls) 

C. = 75pF | 

CL = 75pF EQ 
L = 75pF(2) fe 


C 
C. = 100 pF (LAD) 
C. = 75pF (Controls)(2) _ 


CL = 75 pF 


a 


HOLDA Output 
Valid Delay 
ALE Width 


ALE Output Valid Delay 


To 

T3 

T4 

Ts 

T6 

T6H 

17 

Tg 

Tg Output Float 2 

Delay 
| Tow HOLDA Output 
Float Delay 

T10 
TW44 
T42 
143 
44 
T45 
Ti6 
147 


4 
1 


2 
5 


NO 
io) 


ND — 
Oo —+ on _ © o) 


ie) 
io) 


Input Setup 1 


Output Valid | 
Delay 


4 
0 


| 3 
| 4 
Tie | imputSeup2 
al = Salle 
Inactive | 

Ln lll 

| Inactive 

[Ts | 3 
Tie 5 


C= 100 pF (LAD) 

CL = 75 pF (Controls) 
C. = 100 pF (LAD) | 
CL = 75 pF (Controls) 


” 


Reset Hold ery 
| Tz | Reset Width 1281 41 CLK2 Periods Minimum 


NOTES: 

1. IAC/INTo, INTy, INTo/INTR, INT3 can be asynchronous. i ; 

2. A float condition occurs when the maximum output current becomes less than ILo. Float delay is not tested, but should be 
no longer than the valid delay. . 

3. Clock rise and fall times are not tested. 
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Test Conditions 


ons Vin = 1.5V 
Vv, 


L = 10% Point 
= 1.2V 
Processor Clock Vin = 90% Point 
High Time (CLK2) = 0.1V+0.5Vcc 
Processor Clock 10 Vin = 90% Pointto10% 
Fall Time (CLK2) . Point . 
Processor Clock 10 Vin = 10% Point to 90% ; 
Rise Time (CLK2) Point | 
| Output Valid 2 20 CL = 60pF(LAD) | 
Delay Cy. = 50 pF (Controls) 
HOLDA Output | 4 26 
Valid Delay. | 
= ae | 20 | ns | 
ee! ee ee 


|__symbot_ | Parameter | Min 


Processor Clock — 25 
Period (CLK2) 

_ Processor Clock 
Low Time (CLK2) 


L 
Cy = 50 pF 
Cy = 50 pF | 
ALE Output Valid Delay _ a a C. = 50 pF(2) — 


C= 60 pF (LAD) 


Output Float 2 ) 
C, = 50 pF (Controls)(2) 


- Delay 


Ty 

T2 

T3 

14 

Ts 

Tg 

Té6H | 

T7. 

Tg 

Tg - 

ToH 

Tio = 
eer Input Hold | ia ae 
HOLD Input Hold 
143 Setup to ALE | 

Inactive 

T14 ~ Hold after ALE 

ma Inactive 
tad 


Cy = 60 pF (LAD) 
C. = 50 pF (Controls) 


CL = 60pF(LAD) — 
CL = 50 pF (Controls) — 


41 CLK2 Periods Minimum 
NOTES: | | | 


1. IAC/INTo, INT, INTo/INTR, INT3 can be asynchronous. | | 
2. A float condition occurs when the maximum output current becomes less than Io. Float delay is not tested, but should be 
no longer than the valid delay. . | 


3. Clock rise and fall times are not tested. 


Reset Hold 
| 
Reset Width 


80960KB 
TRISTATE OUTPUT 


mi 
u 


270565-31 


80960KB 


OPEN=DRAIN OUTPUT 1 


lo. Tested at.25 and 40 mA 


Vrer = Voc . 
D, and Do are matched 


270565 -32 


Figure 13. Test Load Circuit for Figure 14. Test Load Circuit for Open-Drain Output Pins 
Tri-State Output Pins 
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— 
-) 


C. = 60 pF (LAD) 
~ C, = 50 pF (Controls) 


80960KB AC Characteristics (25 MHz, PGA Only) | 
Test Conditions 
Ty Processor 20 125 ns Vin = 1.5V 
Clock Period (CLK2) 
To Processor Clock 5 ns Vit = 10% Point 
Low Time (CLK2) = 1.2V , 
Vin = 90% Point 
= 0.1V + 0.5 Vcc 
T4 Processor Clock Vin = 90% Point to 10% 
Fall Time (CLK2) Point 
7 Ts Processor Clock 10 Vin = 10% Point to 90% 
. Rise Time (CLK2) Point 
Ts Output Valid 2 18 ns Cy = 60 pF (LAD) 
Delay 7 Ci = 50 pF (Controls) 
TéH HOLDA Output 4 24 ns CL = 50 pF 
Valid Delay 
ALE Width ee ae CL=S0pF 
ALE Output Valid Delay | 0 | CL = 50 pF (2) om 
Tg Output Float 2 18 Ci = 60 pF (LAD) 
Delay | C,. = 50 pF (Controls) 
ToH , HOLDA Output 4 20 C, = 50 pF 
Float Delay 
T44 Input Hold 
143 Setup to ALE 
, Inactive | 
1414 Hold after ALE Cy. = 60 pF (LAD) 
Inactive C, = 50 pF (Controls) 
147 Reset Width 820 41 CLK2 Periods Minimum 
NOTES: | | 
1. IAC/INTO, INT1, INT2/INTR, INT3 can be asynchronous. 
2. A float condition occurs when the maximum output current becomes less than Io. Float delay is not tested, but should be ~ 
no longer than the valid delay. . 
3. Clock rise and fall times are not tested. 
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HIGH LEVEL (MIN) 0.55V¢¢ 


LOW LEVEL (MAX) 0.8V 
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OUTPUTS 


INIT teeta (BADAC, 

IAC,) MUST BE SETUP.8 CLOCKS 7 

PRIOR TO THIS CLK2 EDGE _ [718 = Reser SETUP 
INIT PARAMETERS MUST BE HELD | | . Hay. = BESET PIN 
BEYOND THIS CLK2 EDGE 


| 270565-7 
Figure 16. RESET Signal Timing 
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Figure 17. Hold Timing 


Design Considerations 


Input hold times can be disregarded by the designer 
whenever the input is removed because a subse- 
quent output from the processor is deasserted (e.g., 
DEN becomes deasserted). 


Whenever the processor generates an output that 
indicates a transition into a subsequent state, any 
outputs that are specified to be tri-stated in this new 
state are guaranteed to be tri-stated. For example, in 
the Ty cycle following a Tg cycle for a read, the mini- 
mum output delay of DEN is 2 ns, but the maximum 
float time of LAD is. 20 ns. When DEN is asserted, 
however, the LAD outputs are guaranteed to have 
been tri-stated. 


Designing for the ICE-960KB 


The 80960KB In-Circuit Emulator assists in debug- 
ging both 80960KA and 80960KB hardware and 


software designs. The product consists of a probe © 


module, cable, and control unit. Because of the high 
operating frequency of 80960KB systems, the probe 
module connects directly to the 80960KB socket. 


When designing an 80960KB hardware system that 
uses the ICE-960KB to debug the system, several 
electrical and mechanical characteristics should be 
considered. These considerations include capacitive 
loading, drive requirement, power requirement and 
physical layout. — 


The ICE-960KB probe module increases the load 
Capacitance of each line by up to 25 pF. It also adds 
one standard Schottky TTL load on the CLK2 line, 
up to one advanced low-power Schottky TTL load 
for each control signal line, and one advanced low- 
power Schottky TTL load for each address/data and 
byte enable line. These loads originate from the 
probe module and are driven by the 80960KB proc- 
essor. 


To achieve high noise immunity, the ICE-960KB 
probe is powered by the user’s system. The high- 
speed probe circuitry draws up to 1.1A plus the max- 
imum current (Icc) of the 80960KB processor. 


The mechanical considerations are shown in Figure 
18, which illustrates the lateral clearance require- 
ments for the ICE-960KB probe as viewed from 
above the socket of the 80960KB processor. 
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EMULATION 


PROCESSOR 


VERTICAL 
CLEARANCE 1.2" 


VIEW FROM 
ABOVE USER CPU 


EMULATION 
PROCESSOR 


SOCKET ; ICE PROCESSOR MODULE 


RIBBON CABLE CONNECTOR 


MINIMUM CABLE 
BEND RADIUS: 
LESS THAN 3.0" 


CABLE TO ICE CONTROL UNIT ; 
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alae 18. \CE-960KB Lateral Clearance Requirements: 


MECHANICAL DATA 


Package Dimensions and Mounting 


The 80960KB is available i in two different packages: 
a 132-lead ceramic pin-grid array (PGA) and a 132- 
lead plastic quad flat pack (PQFP). Pins in. the ce- 
-ramic package are arranged 0.100 inch (2.54 mm) 
center-to-center, in a 14 by 14 matrix, three rows 
around. (See Figure 19.) The plastic package uses 
fine-pitch gull wing leads arranged in a single row 
along the perimeter of the package with 0.025 inch 
(0.64 mm) spacing. (See Figure 20.) Dimensions are 
given in Figure 21 and Table 7. 


There are a wide variety of sockets available for the 
ceramic PGA package including low-insertion or 
zero-insertion force. mountings, and a choice of ter- 
minals such as soldertail, surface mount, or wire 
wrap. Several applicable sockets ¢ are shown in Fig- 
ure 22. ; 7 


The POFP is normally surface mounted to take best 
advantage of the plastic package’s small footprint 
and low cost. In some applications, however, de- 
_ signers may prefer to use a socket, either to improve 


heat dissipation or reduce repair costs. Figuics 23a 
and 23b show two of the many sockets available. 


Pin Assignment 

The PGA and PQFP have different pin assignments. 
Figure 24 shows the view from the bottom of the 
PGA (pins facing up) and Figure 25 shows a view 
from the top of the PGA (pins facing down). Figures 
20 and 32 show the top view of the PQFP; notice 
that the pins are numbered in order from.1 to.132 
around the package’s perimeter. Tables 5 and 6 list 
the function:of each pin in the PGA, and Tables 8 
and 9 list the function of each pin in the PQFP.. | 


Vcc and GND connections must be made to multi- 
ple Vcc and GND pins. Each Vcc and GND pin must 
be connected to the appropriate voltage or ground 


and externally strapped close to the package. We 


recommend that you: include separate power. and 
ground planes in your circuit pReale for powst distri- 
bution. | | | 


NOTE: , | 
Pins identified as N. C., “No Connect,” should never — 
be connected. 
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Package Thermal Specification 


The 80960KB is specified for operation when case 
temperature is within the range 0°C to + 85°C (PGA) 
or +100°C (PQFP). The case temperature should 
be measured at the top center of the package as 
shown in Figure 26. 


The ambient temperature can be calculated from 0), 


and 9j, by using the following equations: 
Ty = To + P*6ic 
Ta = Ty — P*6iq 


Values for 0j4 and Gc are given in Table 10 for the 
PGA package and in Table 11 for the PQFP for vari- 
ous airflows. Note that the 0}, for the PGA package 
can be reduced by adding a heatsink, while a heat- 
' sink is not generally used with the plastic package 
since it is intended to be surface mounted. The max- 
imum allowable ambient temperature (T,) permitted 
without exceeding Tc is shown by the charts in Fig- 
ures 27 through 30 for 10 MHz, 16 MHz, 20 MHz, 
and 25 MHz respectively. 


The curves assume the maximum permitted supply 


current (Icc) at each speed, Vcc of 5.0V, and a 


TcaSE of +85°C (PGA) or + 100°C (PQFP). 


If you will be using the 80960KB in a harsh environ- 
‘ment where the ambient temperature may exceed 
the limits for the normal commercial part, you should 
consider using an extended temperature part. These 
parts are designed by the prefix ‘““TA”’ and are avail- 
able at 16, 20 and 25 MHz in the ceramic PGA pack- 
age. The extended operating temperature range is 
— 40°C to + 125°C case. Figure 30 shows the maxi- 
mum allowable ambient temperature for the 20 MHz 


extended temperature TA80960KB at various air- — 
flows. The curve assumes an Icc of 420 mA, Vcc of | 


5.0V, and a Tcase of + 125°C. 


WAVEFORMS 


Figures 33 through 38 show the waveforms for vari- 


_ ous transactions on the 80960KB’s local bus. 


SUPPORT COMPONENTS 


85C960 Burst Bus Controller 


The Intel 85C960 performs burst logic, ready gener- 
ation, and address decode for the 80960KA and 
80960KB. The burst logic supports both standard 


and burst mode memories and peripherals. The 


ready generation and timing control supports 0 to 15 
wait states across eight address ranges for read/ 
write and burst accesses. The address decoder de- 
codes eight address inputs into four external and 
four internal chip selects. The wait state and chip 
select values may be programmed by the user; the 
timing control and burst logic are fixed. 


The 85C960 operates with the 80960KA and 
80960KB at all frequencies and consumes only 50 
mA at 25 MHz. The 85C960 is housed in a 28-pin, 
300-mil ceramic DIP and plastic DIP packages or 28- 
pin PLCC package for surface mount. In the ceramic 


_ DIP package the part is UV-erasable, which makes it 


easy to revise designs. Order the 85C960 data sheet | 
(No. 290192) for full details. 


27960KX Burst Mode EPROM 


Intel 27960KX one-megabit EPROM is designed 
specifically to support the 80960KA and 80960KB. It 


_uses a burst interface to offer near zero wait-state 
_performance without the high cost of alternative 


memory technologies. The 27960KX removes the 
need for ‘‘dumping” code and data stored in slow 
EPROMs or ROMs into expensive none speed 


“shadow” RAM. 


Internally, the 27960KX is abe in blocks of four 
bytes that are accessed sequentially. The address 
of the four-byte block is latched and incremented 


internally. After a set number of wait-states (1 or 2), 


data is output one word at a time each subsequent 
clock cycle. High-performance outputs provide zero 
wait-state data-to-data burst accesses. Extra power 


-and ground pins dedicated to the output reduce the 


effect of fast output switching on the device. The 
27960KX offers 1-0-0-0 performance at 20 MHz and 
2-0-0-0 performance at 25 MHz. Full details can be 
found in ie 27960KX data sheet (No. 290337). 
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PIN #1 saSneN PENH V HSER | | 087 (1.269) ue or 
his. , ne 725 (18.401) ee . | 
if@ @) + .650 (16.497) RN — 
211©@ © @ @ + .550 (13.959) AN 
31©©O@ @ He .450 (11.421) : HN 
sH@@©@OX @ +E .350 (8.883) HN | 
ra ITOROKO) @) 4 .250 (6.345) NY | 
61|©O® @) + .150 (3.807) \ 

A | LOKOXO) (@) +} .050 (1.269) \ 
8OO@] O) | ae \ | 
oe | LORORO) © 

TH LOKOXOM I © oe IIN 

Te | LOKOXO) (0) uy 
21008 OU swence om , mii “ 
Es | KOKORO) © |]. STANDOFF 018(0.47) HN to 
411 @© © © (4) PLACES DIA TYP 3 HN | 


TYP BRAZE PAD - 
1.450 (36.802) 


* RUNDE 20000 Oe SERSESESSENSESUNSASLEESST SLASEESSNSSSSAEGSSSESNNSSSRARSSSPSNEGNN! | : . A ‘ha . 
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\>1 i. ; : é 020 — 165 (4.189) - 
: | (0.508) = 
— .070 (1.777) DIA | 110 (2.792) 
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Figure 19. A 132-Lead Pin-Grid Array (PGA) Used to Package the 80960KB_ | 
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| Figure 20. The 132-Lead Plastic Quad Flat Pack (PQFP) used to Package the 80960KB 
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3 D2 7 
D BASE PLANE 
in 
D1 
2 


SEATING PLANE 
mm (inch) . 10.10 (0.004)| 


270565-—34 


Figure 21a. Principal Dimensions of the 132-Lead PQFP 


3.81 (0.150) MAX TYP. 


DETAIL M 


P 0.90 (0.035) MIN | 


1.32 (0.052) 
1.22 (0.048) 


0.090 (0.035) MIN 


2.03 (0.080 
1.93 (0.076 


~«~—— DN? 


SEE DETAIL M 


HTTOTTTOUTINTTTOTTTOTT > 


mm (inch) 1.91 (0.075) MAX TYP 
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_ Figure 21b. Details of the Molding of the 132-Lead PQFP 
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0.635 (0.025) 


| Te 1p 


EN 


0.31 (0.012 
0.20 Snes | 


. mm (inch). DETAIL J TAIL L 
n (inch) ve 270565-36 


Figure 21c. Terminal Details for the 132-Lead PQFP 


3-108 


intel. 80960KB 


00" 


preesermenresessse! 
HOUCOOUOCQQC0C0U00C0C000000000 


[TS 
— 


0.025" —— 


UOCUCUCUCUCUCUCUUUUUCUUOUOOCOUCL 


0.0125" x 0.070" 
270565-37 


Figure 21d. Board Footprint Area for the 132-Lead PQFP 


_ Table 7. Package Dimension: 80960KB PQFP : 


ee 
- | Min | Max | Min | Max 


Sr 
TA | Packageteint | ate | aro | 4060 | aazo 
[ai | Standot ————~—~—~*?~Caceao | oca0 | oso | o700 
[oe [Terminal Dimension [1.076 | 088 | evsto | 27.560 


Bumper Distance 
27.860 
27.860 


Without Flash > 1.097 
With Flash 1.097 
D4,E4 Foot Radius Location 1.023 _ 1.037 25.890 — 26.330 
| ut | FootLength 0.020 0.030 0.510 0.760 


24.210 


28.010 
28.190 
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© Low insertion force (LIF) soldertail 
55274-1 

e Amp tests indicate 50% reduction in 

insertion force compared to 
machined sockets 

Other socket options ae 

© Zero insertion force (ZIF) soldertail 
55583-1 ae 

e Zero insertion force (ZIF) Burn-in 
version 55573-2 oe 

Amp Incorporated 
(Harrisburg, PA 17105 U.S.A - . 
Phone 717-564-0100) 


- §5274-1 


55583-1 


270565-13 


Cam handle locks in low profile position when 80960KB is installed 
(handle UP for open and DOWN for closed positions). 
Courtesy Amp Incorporated — 


Peel-A-Way* Mylar and Kapton Peel-A-Way Carrier No. 132: 
Socket Terminal Carriers Kapton Carrier.is KS132 


* Low insertion force surface Mylar CarrierisMS132° | 
mount CS132-37TG =. Molded Plastic Body KS132 

e Low insertion force soldertail is shown below: 
CS132-01TG | 


© Low insertion force wire-wrap ae 
CS132-02TG (two-level) _* 
CS132-03TG (thee-level) . 

‘e Low insertion force press-fit _ ~~ 
CS132-05TG. da shoaisde 


Advanced Interconnections 
(5 Division Street) 
Warwick, RI 02818 U.S.A. 
- Phone 401-885-0485) - eonsecscnee: 
. bt .100 TYP 
' 14x 14x 3 ROWS 


te ae 270565-15 
. Courtesy Advanced Interconnections 
‘(Peel-A-Way Terminal Carriers 

U.S. Patent No. 4442938) 


*Peel-A-Way is a trademark of Advanced Interconnections. 


Figure 22. Several Socket Options for Mounting the 80960KB 
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Part Number: 1-821932-5 


— 270565-38 


Figure 23a. AMP Micropitch Socket for the 132-Lead Plastic 
Quad Flat Pack, 0.025” Lead Spacing, Gull Wing Leads 
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Part Nu mber. 
2. 0132- 07244- 000-018007 


© 


 270565-46 


Figure 23b. 3M Company PQFP Socket and Lid 
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Figure 24. 80960KB PGA Pinout—View from Bottom (Pins Facing Up) 
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Figure 25. 80960KB PGA Pinout—View from Top (Pins Facing Down) 
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Table 5. 80960KB PGA Pinout—In Pin Order 
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CC 

Vss 

LAD49 | 

LAD17 

LADig 

LAD14 

LAD44 

LADg | 

LAD7 

LADs 

LAD, 

LAD, 

INTo/INTR | Ki | BEg — 

Vec | 13 |_NG. 
LAD2g pi4_ | NC. | Ks | Vss 

LADo« | | Veo.” 
_ LADo22 | 
LADoy 


< 
QO 
@) 


Zizi\zizizizilz|zl2|zZi2|zZ/z2 
1OlOlOlOlOlOlOloOloOlolololo 


LADi5 
LAD45 
LADio 
LADg 
LAD> 
CLK2 
-LADo 


RESET | N.C. 


rr 
<a | 


9) 


HOLD/HLDAR |. a Veo N.C. 
N.C. 


S < aad ee ee ee 
O © O}O;O POLO) O 


Voc 
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Table 6. 80960KB PGA Pinout—in Signal Order 
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MEASURE PGA CASE TEMPERATURE MEASURE PQFP TEMPERATURE AT 
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Figure 26. Measuring 80960KB PGA and PQFP Case Temperature 
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Figure 28. 16 MHz 80960 K-Series Maximum Allowable Ambient Temperature 
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Figure 29. 20 MHz 80960 K-Series Maximum Allowable Ambient Temperature 
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Figure 30. Maximum Allowable Ambient Temperature for 
the 80960KB at 25 MHz (available in PGA only) 
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Figure 31. Maximum Allowable Ambient Temperature for the Extended 
Temperature TA-80960KB at 20 MHz (available in PGA only) 
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Figure 32. 80960KB PQFP Pinout—View from Top 
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| Table 9. 80960KB Plastic Package Pinout—In Signal Order 
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Table 10. 80960KB PGA Package Thermal Characteristics 


Thermal Resistance—°C/Watt 


ed Airflow—ft./min (m/sec) 
eevee + 100 400 | 600 | 800 
foc (0.50) (2.03) | (3.04) | (4.06) 
|@ Junction-to-Case | 
(Case Measured 2 “2 72 2 2 
as shown in Figure 26) 
@ Case-to-Ambient } | ; ee 
were [| ef 7s] el} ole | Tf 
6 Case-to-Ambient : a eilebort? 
(with Omnidirectional 14 | 12 7 os * | | 
6 Case-to-Ambient | | | | 
(with Unidirectional) 15; 14 13° 11 
Heatsink) | 


Heatsink) 

NOTES: 7 — a , 

1. This table applies to 8O960KB PGA 3. 0).cap = 4°C/w (approx.) 

plugged into socket or soldered di-  . 0j.piy = 4°C/w (inner pins) (approx.). 


rectly into board. ; 6)-Pin = 8°C/w (outer pins) (approx.) 
2. Oa = Ojo + Oca. | 2 


- Table 11. 80960KB PQFP Package Thermal Characteristics 


PQFP Thermal Resistance—°C/Watt . | | 


- Airflow—ft./min (m/sec) 

100 400 ; 600 | 800 

(0.50) (2.03) | (3.04) | (4.06) 
6 Junction-to-Case | 
(Case Measured | , 
as shown in Figure 26) | 3 
6 Case-to-Ambient 
(No Heatsink) ze] w]e] we] [e |e 


Parameter 


NOTES: _ oe 
1. This table applies to 80960KB 3.6) = 18°C/Watt 
PQFP soldered directly into board. 938 = 18°C/Watt - 


2. Oya = 9c + Oa. 
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Figure 33. Read Transaction | 
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Figure 35. Burst Read Transaction 
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Figure 36. Burst Write Transaction with One Wait State 
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NOTE: 
INTR can go low no sooner than 5 ns (input hold time) following the beginning of interrupt acknowledgement cycle 1. 
For a second interrupt to be acknowledged, INTR must be low for at least three cycles before it can be reasserted. — 


Figure 37. interrupt Acknowledge Transaction | 
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Figure 38. Bus Exchange Transaction (PBM = Primary Bus Master, SBM = Secondary Bus Master) 
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1.0 PURPOSE 


The 80960CA Product Overview is a summary of the 
features and operation of Intel’s 80960CA Embedded 
Processor. The Product Overview is intended for those 
who are not familiar with the 80960 architecture or the 
80960CA, a product built around this architecture. The 
80960CA Product Overview provides a programmer or 
a system designer with a quick, global view of software 
and hardware design considerations for the 80960CA. 
For further information, refer to the following refer- 
ence documents: 


— The 80960CA User’s Manual contains detailed tech- _ 


nical information and examples for designing em- 
bedded systems using the 80960CA. 


— The 80960CA Data Sheet provides electrical specifi- 
cations for the device, such as the DC and AC pa- 
rameters, operating conditions, and packaging spec- 
ifications. 


2.0 80960CA 32-BIT EMBEDDED 
PROCESSOR 


The 80960CA (Figure 2-1) is nee for embedded 
processing applications. This product features the high- 
performance C-Series core plus built-in system periph- 
erals, effectively integrating a high-speed CPU and sys- 
tem components onto a single silicon die. The 80960CA 
is a member of Intel’s 80960 embedded processor fami- 
ly. Each member of the 80960 family is based on a 
common architectural definition nelenied to as the core 
architecture. 


An 80960 family member, such as the 80960CA, is 
made up of. an implementation of.the core architecture 
plus application-specific extensions. These extensions 
may consist of integrated peripherals, instruction-set 
extensions, or additional registers and caches beyond 
those defined by the architecture. The common core 
architecture provides a basis for code compatibility for 
all 80960 family products, while application-specific ex- 
tensions optimize a particular product for a class of 
applications. 


The 80960 aechitectiral target is the execution of mul- 
tiple instructions per clock (i.e., fractional clocks per 
instruction). By defining an architecture which sup- 
ports parallel instruction execution and out-of-order in- 
struction execution, performance advances are not con- 
strained by the system clock. 


The 80960CA is capable of launching and executing 
instructions in parallel. This is accomplished by the use 
of advanced silicon technology. as well as innovative 
“microarchitectural” constructs. The term microarchi- 


ADVANCE INFORMATION 
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tecture refers to the implementation of the instruction 
set and programming resources. For example, different 
microarchitectures may have different pipeline con- 
struction, internal bus widths, register set porting, de- 
grees of parallelism, and cache parameterization (two- 
way, four-way, etc.). 


A principal objective of the 80960 architecture is to 
provide the framework to allow microarchitectural ad- 
vances to translate directly into increased performance 
without architectural limitations. 


DMA Controller 


C=Series 
Core 


Bus Control | Interrupt Unit, 


Unit 
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Figure 2-1. 80960CA 


2.1 80960 Architecture 


Embedded applications are cost sensitive, require a dif- 
ferent mix of instructions than reprogrammable appli- 
cations, have demanding interrupt response require- 
ments, and often use real-time executives rather than 
full-blown operating systems. The 80960 architecture 
was developed with these factors in mind. Several key 
optimizations which are provided by the architecture 
are explained below. 


Instruction Set: Powerful Boolean operations are pro- 
vided. Frequently executed functions are available as 
single instructions for greater code density and per- 
formance. Call, Return, Compare-and-Branch, Condi- . 
tional-Compare, Compare-and-Increment or Decre- 
ment, and Bit-Field-Extract are each single instruc- 
tions. 


Interrupts: A priority interrupt structure simplifies the 
management of real-time events. With 31 discrete levels 
of priority and 248 possible interrupt-handling proce- 
dures, this structure provides the low latency and high 
throughput interrupt handling required in embedded 
processor applications. 
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Faults: A generalized fault-handling mechanism simpli- 
fies the task of detecting errant arithmetic calculations 
or other conditions that typically require a significant 
amount of in-line user code. 


Application-Specific Extensions: The core architecture 
is designed to accept application-specific extensions 
such as instruction set extensions (e.g., string functions, 
floating point), special purpose registers, larger caches, 
on-chip program and data memory, a memory manage- 
ment and protection unit, fault-tolerance support, mul- 
tiprocessing support, and real-time peripherals (DMA, 
serial ports, etc.). 


2.2 80960 C-Series Core 


The C-series core is an implementation of the 80960 
core architecture. The core can execute instructions at 
a sustained speed of 66 MIPS,1) with bursts of perform- 
ance up to 99 MIPS. To achieve this level of perform- 
ance, Intel has incorporated state-of-the-art silicon 
technology and innovative microarchitectural con- 
‘structs into the C-Series core. Factors which contribute 
‘to the core’s performance are listed below. | 


— Parallel instruction decoding allows the 80960CA 
to start two instructions in every clock, with bursts 
of three instructions per clock. | 

— Most instructions execute in a single clock cycle. 

—- Multiple independent execution units enable over- 
lapping instruction execution. 

—. Advanced silicon technology allows operation with 
a 33 MHz internal clock. 

— Efficient instruction pipeline is designed to mini- 
mize pipeline break losses. 

— Register and resource. scoreboarding transparently 
manage parallel execution. 

— Branch look-ahead feature enables branches to exe- 
cute in parallel with other instructions. 

— Local register cache is integrated on-chip. 

— 1 Kbyte two-way set associative instruction cache is 
integrated on-chip. 


— 1 Kbyte Static Data RAM is integrated on-chip. 


These factors combine to make the 80960CA an ultra- 
high performance Pompuune engine. 


| NOTE: 
1. Sinsie clock instructions at 33 MHz. 


2.3 B0960CA \ System Peripherals 


The 80960CA features several extensions to the core 
architecture in the form of integrated peripherals. 
These peripherals are intended to reduce the external 
system requirements needed for embedded applications. 
These peripherals are described below. 
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Bus Controller Unit: A 32-bit high-performance bus 
controller interfaces the 80960CA to external memory 
and peripherals. The bus controller transfers instruc- 
tions or data at a maximum rate of 132 Mbytes per 
second.(7) Internally programmable wait states and 16 
separately configurable memory regions allow the bus 
controller to interface with a variety of memory subys- 
tems with minimum system complexity and maximum 
performance. 


DMA Controller: A four channel DMA controller per- 
forms high speed data transfers between peripherals 
and memory. The DMA controller provides advanced 
features such as data chaining, byte assembly and disas- 
sembly, and a fly-by mode capable of transfer speeds of 
up to 66 Mbytes per second. The DMA controller fea- 
tures a performance and flexibility which is only possi- 
ble by integrating the DMA controller and the 
80960CA core. 


Interrupt Controller: A priority interrupt controller 


manages 8 external interrupt inputs, 4 internal inter- 


rupt sources from the DMA controller, and a single 
non-maskable interrupt input (NMI). A total of 248 


‘ external interrupt sources are supported by the inter- 


rupt controller by configuring the 8 external interrupt | 
pins as an 8-bit input port. The interrupt controller pro- 
vides the mechanism for the low latency and high 
throughput interrupt service featured by the 80960CA. 
The interrupt latency for the 80960CA is typically less 


than 1 ps. 


3.0 EXECUTION ENVIRONMENT 


The Execution Environment (Figure 3-1) refers to the 
resources which are available for executing code on the 
80960CA. The following sections describe the elements 

of the execution environment. | 


3.1 Registers and Literals 


The 80960CA provides four types of working data reg- 
isters: Global Registers, Local Registers, Special Func- 


‘tion Registers (SFRs), and Control Registers. . - 


Global and local registers are general purpose 32-bit 
data registers. The SFRs and the control registers pro- 
vide a programmer’s interface to the on-chip peripher- 
als (i.e., the DMA controller, paneTEUD controller, and 
bus controller). x A : 


NOTE: 
2, 33 MHz internal clock, load or. instruction fetch on 
O wait state, pipelined burst bus. 
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Figure 3-1. Execution Environment 


The 80960 architecture is a register-oriented architec- 
ture. That is, operands and results of instructions are 
placed in working data registers rather than in memory. 
Since the architecture is register oriented, an ample 
supply of registers is provided. The architecture’s work- 
ing register set consists of 16, 32-bit global registers and 
16, 32-bit local registers. 


3.1.1 GLOBAL AND LOCAL REGISTERS 


The procedure call and return mechanism, which is 
part of the 80960 architecture, inspires the names given 
to the local and global registers. When a procedure call 
or return is executed, the contents of global registers 
are preserved across procedure boundaries. In other 
words, the same set of global registers is used for each 


procedure. A new set of local registers, however, is allo- . 


cated for each procedure. The 80960’s call and return 
mechanism is explained in Section 3.8. 


The 80960CA supplies 16, 32-bit global registers desig- 
nated g0 through g15. Registers gO through g14 are 
general purpose global registers. Register g15 is re- 
served for the current Frame Pointer. This register is 
available in assembly language as the fp register. The fp 
contains the address of the first byte in the current 
stack frame. The fp register and the stack frame are 
described in Section 3.8. 


The 80960CA supplies 16, 32-bit Local Registers desig- 
nated r0 through r15. Registers r3 through r15 are gen- 


.eral purpose local registers. Registers r0, r1, and r2 are 


reserved for special functions as follows: r0 contains the 
Previous Frame Pointer, rl contains the Stack Pointer, 
and r2 is reserved for the Return Instruction Pointer. 
These registers are available in assembly language as, 


respectively, the pfp, sp, and rip registers. The pfp, sp, 


and rip registers manage stack frame linkage for the 
80960’s procedure call and return mechanism. The 
function of these registers is decribed in Section 3.8. 


3.1.2 SPECIAL FUNCTION REGISTERS AND 
CONTROL REGISTERS 


The 80960CA uses 3 Special Function Registers (SFRs) 
for communicating with on-chip peripherals. These 
SFR’s are an architectural extension specific to the 
80960CA. The SFRs on the 80960CA are designated as 
sf0, sf1, and sf2. SFRs are accessed as source operands © 
by most of the 80960CA’s instructions. The registers 
serve aS part of the programmer’s interface to the 
DMA and interrupt controller. 
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Control registers, like SFRs are used to communicate 
with the on-chip peripherals. Configuration informa- 
tion for the peripherals is generally stored in these reg- 
isters. Control registers can only be accessed by using 
the system control (sysctl) instruction. The sysctl 
instruction is used to load the internal control register 
from a table in external memory called the control ta- 
ble. In order to simplify the process of peripheral con- 
figuration, the control registers are automatically load- 
ed from this table at initialization. 


3.1.3 LITERALS 


The 80960CA provides literals which may be used in 
the place of source register operands in most instruc- 
tions. The literals range from 0 to 31 (5 bits). When a 
literal is used as an operand, the processor expands it to 
32 bits by adding leading zeros. If the instruction de- 
fines an operand larger than 32 bits, the processor zero 
extends the literal to the operand size. 


“ADDRESS 
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3.2 Address Space and Memory 


The address space of the 80960CA (Figure 3-2) is con- 
sidered a subset of the execution environment since the 
code, data, data structures, and external peripherals for 
the processor reside here. The 80960 family has an ad- 
dress space which is. 232 bytes (4 Gbytes) in size. This 
address space is linear (unsegmented); therefore, code, 
data, and peripherals may be placed anywhere in the 
usable space. For the 80960CA, some memory loca- 
tions are reserved or are assigned apecial functions as 
shown in Figure 3-2. 


3.2.1 INTERNAL DATA RAM 
The 80960CA provides 1 Kbyte of internal static . RAM 


for fast access of frequently used data. The data RAM 


allows time critical data storage and retrieval, with no 
dependence on the performance of the external bus. 
Any load or _— store, including quad-word 
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Figure 3-2. Address Space 
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operations, execute in a single clock cycle when direct- 
ed to internal data RAM. The data RAM is located at 
address 00H in the processor’s address space. When the 
DMA controller is in use, 32 bytes of data RAM are 
reserved for each active DMA channel. Additionally, 
64 bytes of data RAM are reserved for 16 interrupt 
vectors which may be cached internally to reduce inter- 
rupt latency. The data RAM reserved for the DMA 
controller and the interrupt controller can be used for 


additional data storage when these peripherals are not 


used. os 


Two execution modes are possible on the 80960CA, 
user mode or supervisor mode. These modes are used to 
implement a protection model in which system data 
structures are isolated from user code. As shown in 
Figure 3-2, the first 256 bytes of data RAM are always 
write protected when a program is executing in user 
mode but may always be written when executing in 
supervisor mode. The remainder of the data RAM can 
be programmed for. this protection feature. The user 
and supervisor modes are described further in Section 
3.7. , 


3.2.2 RESERVED ADDRESS SPACE bx 
The upper 16 Mbytes of memory (FF000000H — 


_FFFFFFFFH) are reserved for specific functions and — 


extensions to the 80960 architecture. The 12 words in 
reserved space (FFFFFFOOH—FFFFFF2CH) are used 
to start up the processor when it comes out of reset. 
These 12 words are called the initialization boot record. 


3.2.3 ARCHITECTURALLY DEFINED DATA 
STRUCTURES | 


To execute a program on the 80960CA, data structures 


specific to the 80960 architecture must reside in the 
processor’s address space. Architecture-defined data 
structures include stacks, initialization structures, and 
various procedure entry tables. These data structures 
may generally be located anywhere in the address 
space. Pointers to each data structure are specified 
when the 80960CA is initialized. The architecture-de- 
fined data‘ structures include: | 

— User Stack 


— Interrupt Stack 


— Interrupt Table 


— System-Procedure 
Table 


— Fault Table | 
In addition to the data structure defined by the archi- 
tecture, the 80960CA requires several implementation- 
specific data structures which are used for configuring 
peripherals and initialization. These data structures in- 
clude: _ 

— Control Table 

— Process Control Block | 


— Supervisor Stack 


= Initialization Boot Record 


Each data structure will be explained in more detail 
later in this product overview. 


_| Register Indirect with Abase + (Index*Scale) 
Index and Displacement | + Displacement 


3.3 Memory Addressing Modes: 


The 80960CA offers a variety of modes for memory 
addressing. The addressing modes available are summa- 
rized in Table 3-1. | _ 


Absolute addressing. is used to reference an address as 
an offset from address 0 of the processor’s. address 
space. At the machine level, absolute addressing may be 


implemented in one of two ways depending on the size 
_ of the absolute offset from address 0. Two instruction 


formats, MEMA and MEMB, are used to provide abso- 
lute addressing modes. For the MEMA format, the off- 
set is an ordinal number ranging from 0 to 2048. For 
the MEMB format, the offset is an integer (called a 
displacement) ranging from — 23!—1 to 231. An assem- 
bler will choose the MEMA or MEMB format based on 
the size of the offset. | . 


| Register-indirect addressing modes use a 32-bit ordinal - 


value in a register as the base for the address calcula- — 
tion. Offsets and indexes are added to this address base 
depending on the particular addressing mode. The 
register-indirect-with-index addressing mode adds a 
scaled index to the address base. The index is specified 
as a value in a register. The scale value may be selected 
as 1, 2, 4, 8, or 16. | 


The index-with-displacement addressing mode uses a 
scaled index plus an integer displacement. No address 
base is used in this address calculation. 


The IP-with-displacement addressing mode is used with 
load and store instructions to make them IP relative. In 
this mode, an integer displacement plus a constant of 8 
is added to the IP of the instruction to calculate the 
next address. _ | 


Tabie 3-1. Memory Addressing Modes 


Abase + (Index*Scale) 


Index with Displacement | (Index*Scale) + | 

| Displacement | 
Register Indirect with Abase + Displacement 
Displacement | 


IP with Displacement IP + Displacement + 8 
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3.4 Data Types 


The 80960CA operates on the following data types (Figure 3-3): 
— Integer (8, 16, 32, and 64 bits) 

— Ordinal (8, 16, 32, and 64 bits) 

— Bit 

— Bit Field | 

‘— Triple Word (96 bits) 

— Quad Word (128 bits) 


31 i= 0 


LENGTH (1 TO 32 BITS) ees | 
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Data Type 


Numeric Byte Integer 8 bits —27 to2? — 1 

(Integer) Short Integer 16 bits — = 215to 215 — 1, 
Integer 32 bits — 231 to 231 — 1 
Long Integer 64 bits = 263 to 263 — 4 


‘Numeric | Byte Ordinal — 8 bits 0 to 28 — 1 
. (Ordinal) . ~ Short Ordinal 16 bits 0to 216 — 4 
Ordinal | 32 bits 0 to 232 — 1 
‘Long Ordinal 64 bits 0 to 264 — 1 


Non-Numeric _ Bit 1-bit 
_ Bit Field | 1-32 bits 

Triple Word 96 bits 

Quad Word 128 bits 


Figure 3-3. Data Types 
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The following sections describe the data types auPee 
_ed by the 80960CA. 


3.4.1 NUMERIC DATA TYPES 


Integers and ordinals are considered numeric data 


types since the processor performs arithmetic opera-— 


tions with this data. The integer data type is a signed 
binary value in standard 2’s complement representa- 
tion. The ordinal data type is an unsigned binary value. 


3.4.2 NON-NUMERIC DATA TYPES 


The remaining data types (bit field, triple word, and 
quad word) represent groupings of bits or bytes that the 
processor can operate on as a whole, regardless of the 
nature of the data contained in the group. These data 
types facilitate the moving of blocks of bits or bytes. 


3.5 Instruction Set 


The 80960CA features a comprehensive instruction set 


(Table 3-2). Much of the instruction set is that of a 


RISC architecture. Unlike pure RISC machines, how- 


ever, the 80960CA provides an extension to the RISC 
instruction set with instructions that perform complex — 
functions such as procedure calls and returns, high- 
speed multiplies, and other complex control, arithme-. 
tic, and logical operations. The instruction set allows | 


functionally complex yet highly compact code to be 
written for embedded control applications where mem- 
ory is a valuable commodity. | 


3.5.1 STR CHON GROUPS 


The 80960CA facieuction set is most easily described if 


grouped by the functions listed below: 
— Data Movement 

— Address Computation. ' 
— Logical and Arithmetic 
— Bit and Bit Field 

— Comparison _ 

— Branch. | 

— Call and Return 

— Fault 

— Debug 

— Processor Management 


The instructions which make up each of these groups 
are described in the following sections. 
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3.5.1.1 Data Movement Instructions > 


The data movement instructions move data from mem- 
ory to registers, from registers to memory, and between 
registers. The load instructions copy bytes, words, or 
multiple words from memory to.a selected register or 
group of registers. Conversely, the store instructions 
copy bytes, words, or groups of words from a selected 
register or group of registers to memory. The move in- 
structions copy data between registers. 


| Load Instructions 
-Id_ = _— load word 


-.*Idob load ordinal byte 
‘<Idos load ordinal short 
-Idib - load integer byte 
-Idis load integer short. 


- Idi. load long 
- Idt load triple 
-ldq load quad 


Store Instructions 
- st store word 


-stob store ordinal byte 

-stos — store ordinal short 

-stib store integer byte. 

-stis store integer short 

-stl § § storelong | 
+ stt - store triple - 

- stq store quad 


Move Instructions. 


-mov move word 
-=movi = move long 
-movt move triple 

) move quad 


- MOvg | 


3.5.1.2 Address Computation Instructions 


The-load address (Ida) instruction causes a 32-bit ad- 
dress to be computed and placed in a destination regis- 
ter. The address is computed based on the addressing 
mode selected. The load and store instructions perform 
a function identical to that of the Ida instruction when 
calculating a source or destination address. The Ida in- 


struction is useful for loading a 32-bit constant into a 
_Yregister. 


3.5.1.3 Logical and Arithmetic Instructions 


Logical instructions perform bitwise Boolean opera- 
tions on operands in registers. Since this group of in- 
structions performs only bitwise manipulations of data, 
separate logical instructions for integer and ordinal 
data types do not exist. In the table below, srcl and 
src2 represent processor registers or literals which are 
the operands for these instructions. 
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Table 3-2. Instruction Set Summary 


Data Leaieall - Bit and 
Movement g Bit Field 


_ Add | And Set Bit 
Subtract Not And Clear Bit 
Multiply . And Not Not Bit 
Divide Or Check Bit 
Remainder Exclusive Or Alter Bit 
Modulo Not Or Scan for Bit 
Scan for Byte 

Shift | Or Not Span over Bit 
Extended Nor Extract 

Shift Exclusive Nor Modify 
Extended Not 

Multiply _ Nand 
Extended | Rotate 

Divide | 
Add with . 
Carry 
Subtract with 

aly 


Comparison Call and 
i Return 


Compare 
Condition | 
Compare 
Compare and 
Increment - 
Compare and 
Decrement 
Condition Test 


Unconditional Call Conditional. 
Branch Call Extended Fault 
Conditional Call System | Synchronize 
Branch Return Faults 
Branch and 
Link 
Condition 
Compare 
and Conditional 
Branch 


Processor Address 
Computation 


Modify Trace 
Controls 
-Mark 
Force Mark 


Modify Load Address Atomic Add 
Process Atomic Modify 
Controls | 

Modify 
Arithmetic 
Controls 

System Control 

Update DMA 


~ Setup DMA 


Flush Local 
Registers 
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Logical Instructions 


-and src1 and src2 
-notand src and (not src2) 
-andnot (not src1) and src2 
- or srci1 or src2 | 


-notor — srci or (not src2) 
-ornot  (notsrci1) or src2 
-xor src xorsrc2 

- xnor src1 xnor src2 
-nor not (src or src2) 
-nand not (src1 and src2) 
-not not (src1) 


Arithmetic instructions perform add, subtract, multi- 
ply, divide, and shift operations on integer or ordinal 
operands in registers. 


Arithmetic Instructions 


- addi add integer 
-addo —_ add ordinal 
- subi subtract integer 
-subo subtract ordinal 
- muli multiply integer 
-mulo — multiply ordinal 
-divi —_—_ divide integer 
- divo divide ordinal 
- remi remainder integer 
-remo — remainder ordinal 
-modi- modulo integer 
-rotate rotate bit left 

_- shii shift left integer 
-shlo shift left ordinal 
-shri__—_ shift right integer 
-shro _ shift right ordinal | 
-shrdi __ shift right dividing integer 


Extended arithmetic instructions facilitate computation 
on ordinals and integers which are longer than 32 bits. 
In add with carry and subtract with carry instructions, 
the carry out from the previous arithmetic instruction 


is used in the computation. The extended multiply in- . 


‘struction multiplies two ordinal source operands pro- 
ducing a long ordinal result (64 bits). The extended 
divide instruction divides a long ordinal dividend by an 


ordinal divisor and produces a 64-bit result. The ex- 


tended shift right instruction shifts a 64-bit source val- 


ue and produces the lower order 32 bits of the shifted 


value. 


Extended Arithmetic Instructions 
-adde add ordinal with carry 


-sube — subtract ordinal with carry 
-emul extended multiply 
-ediv extended divide 


-eshro _ shift right extended ordinal 


The atomic instructions perform read-modify-write op- 
erations on operands in memory. They allow a system 
to insure that when an atomic operation is performed 
on a specified memory location, the operation will be 
completed before another agent is allowed to perform 


an operation on the same memory. These instructions 


are required to enable synchronization between inter- 


- rupt handlers and background tasks in any system. 


They are also particularly useful in systems where sev- 
eral agents (processors, coprocessors, or external logic) 
have access to the same system memory for communi- 
cation. 


Atomic Instructions 


-atadd atomic add 
-atmod atomic modify 


3.5.1.4 Bit and Bit Field Instructions 


The bit instructions operate on a specified bit in a regis- 
ter. | = ! 


Bit Instructions 


- setbit set bit 
- cirbit clear bit 
- notbit not bit. 
- alterbit alier bit 
-scanbit — scan for bit 
’ *spanbit — span over bit 


Bit field instructions operate on a specified contiguous 


group of bits in a register. This group of bits can be 
from 0 to 32 bits in length. 


. Bit Field Instructions 


-extract extract field 
- modify modify field 
-scanbyte scan for byte 


3.5.1.5 Branch Instructions 


The branch instructions allow the direction of program 
flow to be changed by explicitly modifying the 
Instruction Pointer (IP). The target IP in a branch in- 
struction is generally specified as a displacement to be 
added to the current IP. The extended branch instruc- 
tions allow IP calculation using any addressing mode. 


The unconditional branch instructions always alter pro- 
gram flow when executed. 


‘Unconditional Branch 
Instructions 

-b branch 

- bx branch extended 


The RISC branch-and-link instructions automatically 
save a Return Instruction Pointer (RIP) before the © 
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jump is taken. The RIP is the address of the instruction 
following the branch and link. 


Branch and Link instructions 


- bal branch and link 
-balx branch and link extended 


Conditional branch instructions alter program. flow 
only if the condition code flags in the arithmetic control 
register match a value specified in the instruction. The 
‘condition code flags indicate conditions of equality or 
inequality between two operands in a previously execut- 
ed instruction. The arithmetic control register and con- 
dition code flags are described in Section 3.6. 


Based on a branch prediction flag located in the ma- 
chine level instruction, the 80960CA will assume that 
an instruction usually takes or does not take a condi- 
tional branch. By executing along the predicted path of 
program flow, delays due to breaks in the instruction 
stream are often avoided. This feature of the 80960CA 
is referred to as branch prediction. The 80960CA incor- 
porates the branch prediction feature because code us- 
ing a conditional branch instruction usually favors a 
single direction of program flow. 


The branch prediction flag is specified at the assembly 
level by appending a .¢ or .f to a conditional branch 
instruction meaning, respectively, “assume branch tak- 
en” or “assume branch not taken”. For example, the 
assembler mnemonic be.t means that the processor will 
assume that this branch-if-equal instruction usually 
branches when encountered. In the following table. .p 
represents the branch prediction flag. 


Conditional Branch Instructions 


-be.p branch if equal 

-bne.p branch if not equal 

- bl.p branch if less 

-ble.p — branch if less or equal 
-bg.p __ branch if greater 

-bge.p branch if greater or equal 
-bo.p branch if ordered 
-bno.p branch if unordered 


Compare and conditional branch instructions compare 
two operands, then branch according to the immediate 
results. : 


Conditional Compare and 
~ Conditions Branch Instructions 

-cmpibe.p § compare integer 
and branch if 
equal 

-cmpibne.p compare integer 
and branch if 
not equal 
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- cmpibl.p compare integer 
and branch if less 
compare integer | 
and branch if less 
or equal 
compare integer 
and branch if 
greater 
compare integer 
and branch if 
- greater or equal. 
compare integer 
and branch if 
ordered 
compare integer 
and branch if 
unordered 
compare ordinal 
and branch if 
, equal 
compare ordinal 
and branch if . 
not equal 
compare ordinal 
and branch if less 
compare ordinal 
and branch if less" 
~ or equal | 
compare ordinal 
and branch if 
greater . 
compare ordinal 
and branch if 
greater or equal 
check bit 
and branch 
if set 
~ check bit | 
and branch 
if clear 


- cmpible.p 

- cmpibg.p 

- cmpibge.p 
- cmpibo.p 

- cmpibno.p 
- cmpobe.p 
- cmpobne.p 


-cmpobl.p 


- cmpoble.p 

- cmpobg.p 

: saaneueee 
_=bbs.p 


- bbc.p 


3.5.1.6 Compare and Condition Test 
Instructions 


The 80960CA provides several types of instructions 
that are used to compare two operands. The condition 
code flags in the arithmetic control register are set to 
indicate whether one operand is less than, equal to, or 
greater than the other operand. 


Compare Instructions 


- cmpi compare integer 
-cCmpo — compare ordinal 
-chkbit check bit 


Conditional compare instructions test the existing 
status of the condition code flags before a compare is 
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performed. These conditional compare instructions are 
provided to optimize two-sided range comparisons (i.e. 
to test if a value is less than one number but greater 
than another). | 
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Conditional Compare Instructions 


-concmpi conditional compare integer 
- concmpo conditional compare ordinal 


The compare and increment and compare and decre- 
ment instructions set the condition code flags based on 
a comparison of two register sources, decrements or 
increments one of the sources, and finally stores this 
result in a destination register. 


- cmpinci compare and increment integer 
-cmpinco compare and increment ordinal 
-cmpdeci compare and decrement integer 
-cmpdeco compare and decrement ordinal 


The condition test instructions allow the state of the | 


condition code flags to be tested. Based on the outcome 
of the comparison, a true or false code is stored in a 
destination register. The branch prediction flag is used 
in this instruction to reduce the execution time of the 
instruction when the test outcome is predicted correct- 
ly. For example teste.t (test if equal) will execute in a 
shorter time if the condition code flags test true for the 
equal condition. Analogous to the function of the 
branch prediction flag in the conditional compare and 
branch instructions, the prediction flag in this case 
~ eliminates breaks in the micro-instruction sequence 
which is used to implement the condition test instruc- 
tions. 


Condition Test Instructions 


- teste.p test if equal 
-testne.p _ test if not equal 

- testl.p test if less 

-testle.p test if less or equal 

- testg.p test if greater 
-testge.p _test if greater or equal 
- testo.p test if ordered 
-testno.p test if not ordered 


3.5.1.7 Call and Return Instructions 


The 80960CA features an on-chip call and return 
mechanism for making procedure calls to local and sys- 
tem procedures. The call instructions and the call and 
return mechanism is described in Section 3.8. 


Cail and Return Instructions 


-call call 

-callx call extended 
- calls’ call system — 
-ret return 
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3.5.1.8 Fault Instructions 


The 80960CA will fault automatically as the result of 
certain errant operations which may occur when exe- 
cuting code. Fault procedures are then invoked auto- 
matically to handle the various types of faults. In addi- 
tion, the fault instructions permit a fault to be generat- 
ed explicitly based on the value of the condition code 
flags. The branch prediction flag in these instructions is 
used to reduce the execution time of these instructions 
when the state of the condition code flags are guessed 
correctly. : 


Conditional Fault Instructions 


-faulte.p —_ fault if equal 

-faultne.p fault if not equal 

- faultl.p fault if less 

-faultle.p fault if less or equal 

-faultg.p fault if greater 

-faultge.p fault if greater or equal 
_+faulto.p fault if ordered 


-fauitno.p fault if unordered 


The synef instruction causes the processor to wait for 


all faults to be generated which are associated with any 
prior uncompleted instructions. 


- synct synchronize faults 


3.5.1.9 Debug Instructions 


The processor supports debugging and monitoring of 
program activity through the use of trace events. The 


debug instructions support debugging and monitoring 


software. 
Debug Instructions — 
-modtc modify trace controls 
-mark == mark 
- fmark force mark 


3.5.1.10 Processor Management Instructions 


The 80960CA provides several instructions for direct 
control of processor functions and for configuring the 
80960CA’s peripherals. A brief description of the proc- 
essor management instructions is given below. 


Processor Management Instructions 


-modpc modify process controls — 
-“modac modify arithmetic controls 
-sysctl -. system control instruction 
-udma update DMA SRAM 
-sdma _—_— setup DMA 


-flushreg flush local registers 
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3.6 Arithmetic Controls 


The Arithmetic Control (AC) Register is a 32-bit on-chip 
register (Figure 3-4). The AC register is used primarily 
to monitor and control the execution of 80960CA arith- 
metic instructions. The processor reads and modifies 
bits in the AC register when performing many arithme- 
tic operations. The AC register is also used to control 
the faulting conditions for some instructions. The 
modac instruction allows the user to directly read or 
modify the AC register. 


The processor sets the condition code flags (bits 0-2) to 
indicate equality or inequality as the result of certain 
instructions (such as the compare instructions). Other 
instructions, such as the conditional branch instruc- 
tions, take action based on the value of the condition 
code flags. Table 3-3 shows the functional assignment 
for each condition code flag. 


Table 3-3. Arithmetic Condition Codes 


Condition 
Code 


001 | Greater Than 
010 | Equal 
100 Less Than 
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The integer overflow flag (bit 8) and the integer over- 
flow mask (bit 12) are used in conjunction with the 
arithmetic integer overflow fault. The mask bit masks 
the integer overflow fault. When the fault is masked, 
and an integer overflow occurs, the integer overflow 
flag is set but no fault handling action is taken. If the 
fault is not masked, and an integer overflow occurs, the 
integer overflow fault is taken and the integer overflow 
flag is not set. 


The no imprecise faults flag (bit 15) determines if im- 
precise faults are allowed to occur. Fault handling and 
precise and imprecise faults in the 80960CA are dis- 
cussed in Section 3.10. 


3.7 Process Management 


Process management refers to the monitoring and con- 
trol of certain properties of an executing process. The 
following sections describe the mechanisms available on 
the 80960CA to perform this function. 


—————— sarees — 15 12 8 an ames |, 


Reserved (Initialize to 0) 


| Tt Condition Code 
Integer Overflow Flag 


Integer Overflow Mask 
No Imprecise Faults 
270669-—7 


Figure 3-4. Arithmetic Control Register 
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3.7.1 PROCESS CONTROL REGISTER 


The Process Control (PC) Register (Figure 3- 5) proves 
access to process state information. The function for 
the PC register is described below. 


Execution Mode Flag—This flag indicates that the 
processor is executing in user mode (0) or supervisor 
mode (1). 


Priority Field—This 5-bit field indicates the current ex- 


ecuting priority of the processor. Priority values range | 


from O to 31, with O as the. owes and 31 as the highest 
priority. 


State Flag—This flag determines the executing state of | 


the processor. The processor state is either executing 
state (0) or inverupled: state (1). 


Trace Enable Bit and Trace Fault Pending Flags— 
These fields control and monitor trace activity in the 


.° processor. The Trace Enable Bit enables fault genera- 


tion for trace events. The Trace Fault Pending Flag 
indicates that a trace event has been detected. 


The process controls can be modified by software with 
the modify process controls (modpc) instruction. The 


modpc instruction may only write the PC register when — 


the processor is in supervisor mode. . 


3.7.2 PRIORITIES oe 


The 80960 architecture defines a means to assign priori-. 
ties to executing programs and interrupts. The current 


priority of the processor is stored in the priority field of 
the PC register. This priority is used to determine if an 
interrupt will be serviced and in which-order multiple 
pending interrupts will be serviced. Setting the priority 
of an executing program above that of interrupts allows 


critical code to be prioritized and eu without in- 


_terruption. 


The priority field of the PC register can be modified | 


_ directly using the modpce instruction. The priority field 
is also modified to reflect the priority of serviced inter- 
rupts. On a return from an interrupt routine, the priori- 
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ty of the processor is restored to its priority before the 
interrupt occurred. : 


3.7.3 PROCESSOR STATES AND MODES 


The 80960CA may execute programs in user mode or 


. supervisor mode. The user-supervisor protection mecha- 


nism allows a system to be designed in which kernel 
code and data reside in the same address space as user 
code and data, but access to the kernel procedures and 
data is only allowed through a tightly controlled inter- 
face. This interface is the system call table and the in- 
terrupt mechanism. The 80960CA provides a supervi- 


‘sor pin (SUP) to implement memory systems which 


protect code and data from possible corruption by pro- 
grams executing in user mode. Some instructions and 
functions of the 80960CA are also insulated from code 
executing in user mode. 


The processor has two operating states: executing and 
interrupted. In executing state, the processor can exe- 
cute in user or supervisor mode. In the interrupted 
state, the processor always executes in supervisor mode. 


3.8 Call and Return Mechanism 


The 80960 architecture features a built-in call and re- 
turn mechanism. This mechanism is designed to make 
procedure calls simple and fast, and to provide a flex- 


_ ible method for storing and handling variables that are 


local to a procedure. A call automatically allocates a 


~~ new set of local:registers and a new stack frame. All 


linkage information is maintained by the processor, 
making procedure calls and returns virtually transpar- 
ent to the user. A system call instruction is provided as 
a method for calling privileged procedures such as a 
kernel service. The call and return model supports effi- 
cient translation of structured high level code (such as 


-C, or ADA) to 80960 machine language. 


The procedure call and return mechanism provides a 
number of significant benefits which contribute to the 
performance and ease of use of the 80960CA. 


1) The call and return instructions are implemented en- 
tirely on-chip, resulting in an extremely high per- 
formance puprementaiion of these commonly used 
functions. 


Reserved 
(Initialize to 0) 


| | 1 t. Trace Enable: 
- Execution Mode 
Trace Fault Pending 


State 
Priority 
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2) A single instruction to implement each call or return 
operation results in code density improvements com- 
pared to processors which require multiple instruc- 
tions to encode these functions. 


3) By implementing the call and return functions as 
single instructions, the 80960 architecture is open for 
further optimization of these instructions, while 
maintaining assembly-level compatibility. 


4) A program does not have to explicitly save or restore 
the variables stored in the local registers when a call 
or return is executed. The processor does this implic- 
itly on procedure calls and on returns. 


5) The call and return mechanism provides a structure 
for storing a virtually unlimited number of local 
variables for each procedure: the on-chip local regis- 
ters provide quick access to often used variables and 
the stack provides space for additional variables. 


3.8.1 LOCAL REGISTERS AND THE STACK 
FRAME 


At any point in a program, the 80960 has access to a 
local register set and a section of the procedure stack 
referred to as a stack frame. When a call is executed, a 
new stack frame is allocated for the called procedure. 
Additionally, the current local register set is saved by 
the processor, freeing these registers for use by the new- 
ly called procedure. In this way, every procedure has a 
unique stack and unique set of local registers. When a 


Call Nosting : 


call procedure 
call procedure2 
call procedure3 
) 
re) 


Executing 
Here 


Stack Growth — 
(towards 

higher 

_ addresses) 
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call procedure(n=1) 


——> call procedure(n) 
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return is executed, the current local register set and 
current stack frame are deallocated. The previous local 
register set and previous stack frame are restored. This 
call and return mechanism is illustrated in Figure 3-6 
where n is procedure depth for the currently executing 
procedure. 


The procedure stack structure is defined by the 80960 
architecture. The procedure stack always grows up- 
ward (i.e. towards higher addresses) and the stack 
pointer (SP) always points to the next available byte of 
the stack frame. The 80960CA requires that each stack 
frame begins on a 16-byte boundary. Due to this align- 
ment requirement, a padding space of 0 to 15 bytes may 
exist between adjacent stack frames in memory. When 
a stack frame is allocated, the first 16 words are always 
assigned as storage for the local registers; therefore, the 
SP initially points to the 17th word in the stack frame. 
It should be noted that although each stack frame is 
assigned storage space for the local registers, these loca- 
tions in the stack are not guaranteed to contain the 
values of the saved local registers. This is because sever- 
al sets of local registers are cached on-chip rather than 
written to the stack in external memory. This caching 
mechanism is described in detail later in this section. 


3.8.2 PROCEDURE LINKING 


The 80960 architecture automatically manages proce- 
dure linkage. One global register and three local regis- 
ters are reserved for procedure linkage information. 


: Stack Frame for 
Procedure 1 


a 
'G hi 
y, Local Registers for “4 
i Procedure 1 | 


| ‘Stack Frame for 
i Procedure (n=1) 


L - Local Registers for 
7 Procedure (n~1) y 


1 Stack Frame for 
f Procedure (n) 


Pointer 


| | Local Registers for 
Ss | | Procedure (n) | 


= Saved Registers 


| = Current Registers 
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Figure 3-6. Call and Return Mechanism 
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Figure 3-7 describes the pointer structure used to link 
frames and to provide a unique SP for each frame. Reg- 
ister g15 is the Frame Pointer (FP). The FP is the ad- 
dress of the first byte of the current (topmost) stack 
frame. The FP is always updated to point to the current 
frame when calls and returns are executed. Register r0 
is the Previous Frame Pointer (PFP). The PFP is the 
address of the first byte of the stack frame which was 
created prior to the frame containing this PFP. Register 
rl is the Stack Pointer (SP). The SP points to the next 
available byte of the stack frame. Register r2 is reserved 
for the Return Instruction Pointer (RIP). The RIP is 
the address of the instruction which follows a call in- 
struction, this is also the target address for the return 
from that procedure. The RIP is automatically stored 
in register r2 of the calling procedure when a call is 
executed. | 


3.8.3 PARAMETER PASSING 


Parameters may be passed by value or passed by refer- 
ence between procedures. The global registers, the 
stack, or predefined data structures in memory may be 
used to pass these parameters. | 


Previous Frame Pointer rO 


Stack Pointer ort 


ES Return Instruction Pointer r2 


Previous 
Local 
Register Set 


16 Global Registers 
on Chip 


Frame Pointer g15 


Current 
Local 
~ Register Set _ 


: Optional Stack Variables 


Padding Area 


. Previous Frame Pointer r0 
oo Stack Pointer’ ri 
7 - Reserved for RIP r2 


— 
. oe 
- @ ee a ie) 
_ Optional Stack Variables 


Unused Stack 


The global registers provide the fastest method for pass- 
ing parameters. The values to be passed into a proce- 
dure reside in the global registers of the calling proce- 
dure. When a procedure is called, the values in the 
global registers are preserved. If more parameters are to 
be passed than will fit in the global registers, additional 
parameters may be passed in the stack of the calling 
procedure, or in a data structure which is referenced by 
a pointer passed in the global registers. | 


3.8.4 LOCAL REGISTER CACHE 
The 80960CA provides an on-chip cache for saving and 


restoring the local registers on calls and returns. This 
cache greatly enhances performance of the call and re- 


_turn mechanism on the 80960CA. Movement of data 


between the local registers and the register cache is typ- 
ically accomplished in only 4 processor clocks with no 
external bus traffic. When this cache is filled, the regis- 
ters associated with the oldest stack frame are moved to 
the area reserved for those registers on the physical 
stack (Figure 3-7). 


STACK 


Stack Growth 
(toward higher addresses) 


Reserved 
for Local Registers 


270669-9 


Figure 3-7. Stack Frame Linkage 
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The local register cache is a physical extension of the 
internal data RAM. The part of the data RAM used for 
this cache is not visible to the user and is large enough 
to hold up to 5 sets of local registers. The register cache 
may be extended to hold up to 15 sets of local registers. 
When extended, each new register set consumes 16 
words of the user’s data RAM, beginning at the highest 
address and growing downward. The size of the local 
register cache is selected when the processor is initial- 
ized. 7 


In some cases, the contents of the cached local register 
sets may require examination or modification (e.g. for 
fault handling). Since the local registers are cached, the 
flushreg instruction is provided to flush the local regis- 
ter cache to the locations reserved for the registers on 


the stack. This insures that the values in external mem- — 


ory are consistent with the values held in the local reg- 
ister cache. 


3.8.5 LOCAL AND SYSTEM CALLS 


The 80960CA provides two methods for making proce- 
dure calls: local calls and system calls. Local and sys- 
tem calls differ in their operation and use in an applica- 
tion. 
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The local call instructions initiate a procedure call us- 
ing the call and return mechanism described earlier. 
The stack frames for these procedure calls are allocated 
on the local procedure stack. A local call is made using 
either of two local call instructions: call or callx. The 
call instruction specifies the address of the called proce- 
dure using an IP plus displacement addressing mode 
with a range of — 223 to 223—4 bytes from the current 
IP. The callx (call extended) instruction specifies the 
address of the calling procedure using any of the 
80960’s addressing modes. 


A system call is made using the calls instruction. This 
call is similar to a local call except that the processor 
géts the IP for the called procedure from a data struc- 
ture called the system procedure table. The calls in- 
struction requires a procedure number operand. This 
procedure number serves as an index into the system 
procedure table, which contains IP’s for specific proce- 
dures. The system procedure table is shown in Figure 
3-8. 


The system call mechanism supports two types of pro- § 
cedure calls: system-local calls and system-supervisor § 
calls (also referred to as supervisor calls). The system- ° 


YY | Supervisor Stack Pointer _| Pointer 


| Procedure Entry1 ss | ss Procedure Entry! ssid 1 
Procedure Entry 2 
Procedure Entry 3 


Procedure Entry 259 


31 Procedure Entry 210 


a 


XX | 
on 00-Local 


. 10~Supervisor 
Reserved (Initialize to 0) 
V///A Preserved 


270669—11 


Figure 3-8. System Procedure Table 
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local call performs. the same action as the local call 
. instructions with one exception: the IP target for a sys- 
tem-local call is fetched from the system-procedure ta- 
ble. The epee call differs from the local call as 
follows: 


1) A supervisor call causes the processor to switch to 


another stack (called the supervisor stack). 


2) A supervisor call causes the processor to switch to 
the supervisor execution mode and asserts the 
80960CA’s SUDEHVISO! (SUP) pin for all bus accesses. 


The system call mechanism offers several benefits. The 
system call promotes the portability of application soft- 
ware. System calls are commonly used for kernel serv- 
ices. By calling these services with a procedure number 
rather than a specific IP, application software does not 
have to be changed each time the implementation of the 
kernel service is modified. Additionally, the ability to 
switch to a different execution mode and stack allows 
kernel procedures and data to be insulated from appli- 
cation code. 


3.8, 6 IMPLICIT. PROCEDURE CALLS. 


The call and return mechanism described for whooedute 
calls. applies to several classes of call instructions as 
well as to the context switching initiated by interrupts 
and faults. When an interrupt or fault condition occurs, 
an implicit call is performed that saves the current state 
of the processor before branching to the interrupt or 
fault handling procedure. When this context switch oc- 
curs, the local registers are saved and a new stack frame 
is allocated. Additionally, the values of the AC register 
and PC register are saved when the implicit call occurs. 
These values are restored on the return from the inter- 
rupt or fault handler. 


3.9 Interrupts 


An interrupt is a temporary break in the control stream 
of a program so that the processor can handle another 
task. Interrupts may be triggered by the instruction 
stream or by hardware sources internal and external to 
the 80960CA. An interrupt request is associated with a 
- vector (i.e. an address) of an interrupt handling proce- 


dure. The processor will branch to the handling proce- 
dure when an interrupt is serviced. When the handling . 


action is completed, the processor is restored to its state 
prior to the interrupt: 


3.9.1 INTERRUPT VECTORS AND PRIORITY 


Interrupt vectors are simply instruction pointers (ad- 
dresses) to interrupt handling procedures. The 80960 
architecture defines 248 interrupt vectors. This means 
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that 248 unique interrupt handling procedures may be 


used. An 8-bit interrupt vector number is associated 


with each interrupt vector. This number ranges from 8 
to 255. Each interrupt vector has a priority from 1 to 
31,. which is determined by the 5 most significant bits of 
the interrupt vector number. Priority 1 is the lowest 
priority and 31 is the highest. ee O interrupts are 
not defined. 


The 80960CA executes with a unique priority ranging 
from 0 to 31. When an interrupt is serviced, the proces- 
sor’s priority switches to the priority corresponding to 
that of the interrupt request. When a return from an 
interrupt procedure is executed, the process priority is 
restored to its value prior | to servicing the interrupt. 
This priority switching is nandiee a ce) by ue 
80960CA. 


The 80960CA compares its current priority and the pri- 
ority of an interrupt request to determine whether to 
service an interrupt immediately or to delay service. If a 
requested interrupt priority is greater than the proces- 
sor’s current priority or equal to 31, the processor serv- 
ices the interrupt immediately; otherwise, the processor 
saves (posts) the interrupt request as a pending inter- 
rupt so that it can be serviced later. When the proces- 
sor’s priority falls below the priority of a pending inter- 
rupt, the pending interrupt is serviced. With the mecha- 


nism described, interrupts with a priority of 0 will nev- 


er be serviced. For.this reason, vectors numbered 0 to 7 
are not defined. 


3.9.2 INTERRUPT TABLE 


The interrupt table (Figure 3-9) is an architecturally 
defined data structure which holds the interrupt vectors 


and information on pending interrupts. The first 36 
bytes of the table are used to post interrupts. The 31 | 


most significant bits in the 32-bit pending priorities 
field represent a possible priority (1 to 31) of a pending 
interrupt. When the processor posts an interrupt in the 
interrupt table, the bit corresponding to the interrupt’s 
priority is set. For example, if an interrupt with a prior- 
ity of 10 is posted in the interrupt table, bit 10 is set in 
the pending priorities field. 


The pending interrupts field contains a 256-bit string in 


which each bit represents an interrupt vector. When the 
processor posts an interrupt in the interrupt table, the 


bit corresponding to me vector number of that inter- © 
rupt is set. 


Portions of the interrupt table are cached on-chip in a 
non-transparent fashion. This caching is implemented 
to minimized interrupt latency by reducing the number 
of accesses to the table in external memory when an 
interrupt is serviced: 
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Figure 3-9. Interrupt Table 


3.9.3 INTERRUPT STACK 


Stack frames for interrupt handling procedures are allo- 
cated on a separate interrupt stack. The interrupt stack 
can be located anywhere in the processor’s address 
space. The beginning address of the interrupt stack is 
specified when the processor is initialized. 


3.9.4 INTERRUPT HANDLING ACTION 


When an interrupt is serviced, the processor saves the 
processor state and calls the interrupt procedure. The 
processor state is restored upon return from the inter- 
rupt procedure. 


This interrupt service mechanism is handled by an im- 
plicit call operation. When the interrupt is serviced, the 
current local registers are saved. A new local register 
set and stack frame are allocated on the interrupt stack 
for the interrupt handler procedure and the processor 
switches to supervisor execution mode. In addition to 
the local registers, the current value of the AC and PC 
registers are saved as an interrupt record on the inter- 
rupt stack. 


3.9.5 PENDING INTERRUPTS 


Any of the 248 interrupts can be requested by software. 
The system control instruction (sysctl) is provided to 
support this feature. When the system control instruc- 
tion requests an interrupt, one of two actions may oc- 
cur depending on the priority of the requested interrupt 
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and the current process priority. 1) The interrupt is 
serviced immediately, or 2) the interrupt is posted (the 
pending priorities field and the pending interrupts field 
are modified to reflect a pending interrupt). 


Interrupts may also be requested by hardware sources 
internal and external to the 80960CA. Managing the 
hardware sources and posting these interrupts is han- 
died by the interrupt controller. Interrupts requested by 
hardware are posted in an internal register, not in the 
interrupt table. A mask register enables or disables in- 
terrupts from each hardware source. Requesting and 
posting hardware interrupts is described in Section 4.4 
Interrupt Controller. . 


3.9.6 INTERRUPT LATENCY 


The time required to perform an interrupt task switch 
is referred to as the interrupt latency. The latency is the 
time measured between the activation of an interrupt 
source and the execution of the first instruction for the 
interrupt-handling procedure for the source. 


Interrupt inten for the 80960CA varies depending on 


conditions such as: 


— Complex instructions are sacculins when the inter- 
rupt occurs (e.g. sysctl, call, ret, etc.). 


— Outstanding loads to a local register are pending, 
delaying the interrupt context switch. 


— Division, ‘multiplication, or other multi-cycle in- 
structions with a’local register as destination are 
executing. 


The 80960CA has been designed to optimize latency | 
and throughput for interrupts. Two processor features 
are designed for this purpose: : 


First, in the interrupt table, all interrupt vectors with 


an index whose least significant four bits are 0010) can 
be cached in internal data RAM. The processor will 
automatically read these vectors from data RAM when 
the interrupt is serviced. This feature reduces the added 
latency due to an external access of the interrupt table 
for that vector. The NMI vector is always cached in 
data RAM. | 


Second, an instruction cache locking mechanism allows 
interrupt procedures or segments of interrupt proce- 
dures to be stored in the instruction cache. These rou- 
tines are always executed from the internal cache, elim- 
inating external code fetches and reducing latency and 
increasing throughput for the interrupt. 
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3.10 Fault Handling and Instruction 
Tracing . 


The 80960CA 1s able to detect various conditions in 
code or in its internal state that could cause the proces- 
sor to deliver incorrect or inappropriate results or that 
could cause it to head down an undesirable control 
path. These conditions are referred to as faults. The 
80960 architecture provides fault handling mechanisms 
to detect and, in most cases, fully recover from a fault. 


The 80960CA provides on-chip debug support by trig- 
gering trace events and servicing the trace fault. A trace 
event is activated when a particular instruction or type 
of instruction is encountered in an instruction stream. 
The trace event optionally signals a fault. A fault han- 
dling procedure for the trace fault can act as a debug 
monitor and analyze the state of the processor when the 
trace event occurred. | 


3.10.1 FAULT TYPES AND SUBTYPES 


All of the faults that the processor detects are pre- 
defined. These faults are divided into types and sub- 
types, each of which is given a number. Table 3-4 lists. 
the faults that the processor detects arranged by type 
and subtype. | : 


Parallel Fault Entry ) 


Trace Fault Entry 8 


Operation Fault Entry ' 416 


Arithmetic Fauit Entry 24 


40 
Protection Fault Entry 56. 


a Type Fault Entry © — 80 


Reserved 
(Initialize to 0) 
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Table 3-4. Fault Types and Subtypes 


Fault Type| Fault Subtype Fault Record 
Parallel | == ——_|_ XX00 00XX 


Trace Instruction Type XX01 0002 
Branch Trace XX01 0004 
Call Trace XX01 0008 
Return Trace XX01 0010 
Prereturn Trace XX01 0020 
Supervisor Trace XX01 0040 
Breakpoint Trace XX01 0080 


Operation | Invalid Opcode XX02 0001 
Unimplemented X002 0002 
Invalid Operand 

Arithmetic | Integer Overflow 
Arithmetic Zero-Divide 


XX02 0004 
Constraint | Range 
Privileged 


XX03 0001 
XX03 0002 


XX05 0001 
XX05 0002 
NOTE: X refers to preserved locations in the fault record. 


XX07 0001 
XXOA 0001 


Local Procedure Fault Table Entry © 
31 1 


0 
J Address fof} 


System Procedure Table Fault Table Entry 
10 


ot arte 
Po index tf 


Set to 0000027F y¢ 
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Process Controls 
almmeles Control 4 


Fault Flags Fault Type [--- 


of) Fault Subtype | 8 


Address of Faulting Instruction 


eee Reserved 
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Figure 3-11. Fault Record 


3.10.2 FAULT TABLE 


The fault table (Figure 3-10) provides the processor 
with a pathway to fault handling procedures. The fault 
table is an architecture-defined data structure, which 
may be located anywhere in the processor’s address 
space. The location of the fault table is specified at ini- 
tialization. When a fault occurs, an entry in the table is 
selected based on the type of fault that occurs. The 


entry in the fault table contains a pointer to a specific 


fault handler. 


The fault table can contain two types of entries (Figure 
3-10). The first type of entry is simply a pointer to the 
address of the fault-handling procedure. The second 
type of entry is an index into the system-procedure ta- 
ble. Fault-handling procedures accessed through the 
system- procedure table may be executed in user or su- 
pervisor execution mode. 


3.10.3 FAULT HANDLING ACTION 


When a fault occurs, the processor performs an implicit 
call operation to the procedure specified in the fault 
table. In addition to performing the implicit call opera- 
tion, the processor creates a fault record. in its newly 
allocated stack frame. This fault record contains infor- 
mation on the state of the processor when the fault 
occurred and the fault type and subtype (Figure 3-11). 


Some faults can be recovered from easily. When recov- 
ery from a fault is possible, the processor’s fault han- 
dling mechanism allows the processor to automatically 
resume work where the fault was signalled. The re- 
sumption action is initiated with the ret instruction. If 


simple recovery from a fault is not possible, then the — 


fault handling procedure may call a debug monitor, ini- 
tiate a reset, or take other actions to recover from the 
fault. 


3.10.4 TRACING AND DEBUG 


The 80960CA provides a facility for monitoring the ac- . 


tivity of the processor by tracing the instruction stream. 
A trace event occurs at points in a program where cer- 
tain types of instructions are encountered or a certain 


= 


IP or data address is encountered. When a trace event 
occurs, a trace fault can be generated and a trace-fault 
handler called which displays or analyzes the state of 
the processor. 


3.10.4.1 Trace Events 


The Trace Control (TC) Register (Figure 3-12) is used 
to specify the types of instructions which cause trace 
events. When a mode bit in the TC register is set, spe- 
cific instructions will generate trace events. For exam- 
ple, if the branch trace mode bit is enabled and a 
branch instruction is executed, a branch trace event will 
be signalled. An event flag is used to record trace 
events. A single event flag is provided for each mode 
bit. Any trace event generates a trace fault when the 
trace enable bit in the process control register is set. 


The 80960CA recognizes 7 trace events. These events 
are described below. 


Instruction Trace Event—Signalled each time an in- 
struction is executed. This trace event can be used with 
a debug monitor to single step the processor. 


Branch Trace Event—Signalled each time a branch in- 
struction is executed. For conditional branch instruc- 
tions, this event is only signalled when the branch is 
taken. Branch-and-link, call, and return instructions do 
not signal this trace event. | 


Call Trace Event—Signalled each time a branch-and- 
link or call instruction is executed. Implicit calls, such 
as those used in interrupt or fault handling, signal this 
event. When a call trace event occurs, the prereturn 
trace flag (bit 3 in local register r0) is set by the proces- 
sor to indicate a prereturn trace pending. 


Pre-Return Trace Event—Signalled just prior to any ret 
instruction. This event is only signalled if the pre-return 
trace flag in register r0 is set. Since the pre-return trace 
flag is set when a call trace event occurs, the call trace 
mode must be enabled before a pre-return trace event 
can be signalled. 


Return Trace Event—Signalled each time a ret instruc- 


tion is executed. 
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76545 21 


Reserved ; 
(Initialized to 0) 


[_ Instruction Trace Mode 


Branch Trace Mode 
Call Trace Mode 
Return Trace Mode 
Prereturn Trace Mode 
Supervisor Trace Mode 
Breakpoint Trace Mode 


Trace Event Flags: 
Instruction Trace 
Branch Trace 
Call Trace 
Return Trace 
Prereturn Trace 
Supervisor Trace 
Breakpoint Trace . 
Data Address Breakpoint 0 
Data Address Breakpoint 1 
Instruction Address Breakpoint 0 
Instruction Address Breakpoint 1 
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Figure 3-12. Trace Control Register 


Supervisor Trace Event—Signalled each. time a calls | 


instruction is executed where the selected entry type iS 
supervisor, or when a ret from supervisor mode i 1S eXe- 
cuted. 


Breakpoint Trace Events—Signalled each time a mark 
instruction, fmark instruction, or specified address is 
encountered in the instruction stream. The mark in- 
struction signals an event when the breakpoint trace 
mode is enabled, the fmark (force mark) instruction 
will generate a breakpoint trace event regardless of the 
value of the breakpoint trace mode bit. 


Two IP breakpoint registers and two internal data ad- 
dress breakpoint registers are provided on _ the 
-80960CA. These breakpoints are loaded with an in- 
struction or data address using the system control 


(sysctl) instruction. When the address is encountered 


- and the breakpoint trace mode bit is set, a breakpoint 
trace event occurs. A corresponding instruction or data 
address event flag is set in the i register when the 
address i 1S encountered. 


3.10.5 PROCESSOR INITIALIZATION 


The Initial Memory Image (IMI) are the data struc- 
tures needed to initialize the 80960CA (Figure 3-13). 
The initialization boot record, in reserved memory be- 
ginning at FFFFFFOOH, contains a pointer to the Proc- 
essor Control Block (PRCB). The PRCB in turn holds 
pointers to the data structures which are necessary to 
execute code on the 80960CA. The PRCB also holds 
several fields which contain information to initially 
configure the 80960CA. 


Processor initialization begins by asserting the RESET 


pin. At initialization the processor optionally performs 


an internal self-test. A bus confidence test is also per- 
formed by calculating a checksum of 8 words read from | 
external memory. If either of these self-tests fails, the 
FAIL pin indicates the failure and the processor aborts 


initialization. If the self-test passes, the 80960CA con- 


tinues with initialization and branches to the first ad- 


dress of the user’s code. 
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_ Fixed Data Structures 


Address Initialization Boot Record: 


FFFFFFOOH 


Bus 
Configuration 
(Least Significant Byte) 


FFFFFF10OH First Instruction Pointer 
FFFFFF14H . PRCB Pointer | 


FFFFFF18H 


6 Check Words 
(for bus confidence self=test) | 


FFFFFF2CH 


t 
} 


+ 
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Relocatable Data Structures 


User Code: 


Process Control Block (PRCB): 
Byte Offset 


OH 


8H 

CH 

10H 
14H 
18H 
1CH 
20H 


24H 


System Procedure Table 


Other Architecturally Defined 
Data Structures _ 
(not required as part of IMI) 
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Figure 3-13. Initial Memory Image 
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4.0 80960CA SYSTEM 
IMPLEMENTATION 


This section is an overview of the peripherals integrated _ 


with the 80960CA core. The features and operation of 
the Bus Controller, DMA Controller, Interrupt Con- 
troller, and the interfaces between these peripherals and 
the core are described. 


4.1 Peripheral Interface” | 


A program communicates with the on-chip aerpherals 
by reading or modifying the special function registers 
(SFRs) or by loading control registers. The SFRs gen- 
erally serve to transfer status information and data be- 
tween a peripheral and the core, and the control regis- 
ters serve to configure the peripherals. SFRs are ac- 
cessed directly as instruction operands. The control 
registers are loaded by using the system conte! (sysctl) 
instruction. 


4.2 Bus Controller Unit © 


The Bus Controller Unit (BCU) manages the data and 
instruction path between the 80960CA and external 
memory. Data operations and instruction fetches share 
a 32-bit data bus. Memory addresses are output on a 
separate 32-bit address bus. The BCU incorporates sev- 
eral advanced features to simplify the bus interface to 
external memory. A programmable memory region con- 
figuration table allows the characteristics of the exter- 
nal bus to be programmed differently for 16 separate 
regions in memory. The attributes of the external bus 
which are programmable include wait states and exter- 
nal ready control, data bus width (8, 16, or 32 bits), 
burst mode, address pipelining, and byte ordering. The 
region programmable bus options are described in this 
section. 


4.2.1 BUS TRANSFERS, ACCESSES, AND 
REQUESTS 


The distinction between transfer, bus access, and bus 
request, as these terms apply to the 80960CA,:must be 
presented before beginning a discussion of the BCU. 


Transfer—A bus transfer is defined simply as a move- 
ment of code or data between a memory system and the 
80960CA. A write transfer occurs when the memory 
system is the destination of a data movement. A read 
transfer occurs when the 80960CA is the destination 
for a data or a code fetch from memory. 


Bus Access—A bus access is-defined as an address cycle 
and one or more transfers. In burst mode, an access can 
consist of a single address cycle and 1 to 4 transfers. 
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Bus Request—A bus request is issued by the core and 
directed to the Bus Controller. A bus request is sent to 
the BCU when a load, store, or an atomic instruction is 
executed, or when an instruction fetch is needed. Bus 
requests are also issued by the core to perform DMA 
transfers. A bus request can consist of one or more bus 
accesses. For example, an aligned word (32-bit) request 
to an 8-bit memory region will result in four byte- 
length accesses. 


4.2.2 BUS CONTROL COPROCESSOR 


The 80960CA’s peripherals are often referred to as co- 
processors, since their operation is decoupled from the 
execution of the instruction stream. As an integrated 
coprocessor, the BCU receives ‘bus requests and inde- 
pendently carries out the action of moving data or code 
between the processor and external memory. The BCU 
uses a three deep queue to store pending bus requests. 
The queue decouples the core from the BCU, since a 
series of adjacent requests may be issued faster than the 
BCU can service each request. Two of the three queue 
entries store requests from a user’s program (loads, 
stores, fetches, etc.). The third queue entry is used by 
requests originating from a DMA operation. This 
queue entry takes user requests when the DMA is . 
turned off. The 80960CA alternates service of requests 
issued by the user program and requests issued by a 
DMA operation. 


4.2.3 SIGNAL DESCRIPTIONS 


The external bus signals consist of 30 address signals, 4 
byte enables, 32 data lines, and various control signals. 


D31-D0 32-bit Data Bus (bi-directional)—32-, 16-, 
and 8-bit values are transmitted and re- 
. ceived on these lines. The 8- and 16-bit 
quantities are transferred on the low order 
data lines when a memory region is config- 

ured respectively for an 8- or 16-bit bus. 


30-bit Address (outputs)—The 30-bit ad- 
dress bus identifies all external addresses to 
word (4-byte) boundaries. The byte enable 
lines indicate the selected byte in each 
word. 


A31-A2 


BE3-BE0 Byte Enables (outputs)—The byte enables 


select which of 4 addressed bytes are active 
in a memory access. When a memory re- 
gion is configured for an 8-bit bus width, 
BE! and BEO act as the lower two bits of 
the address. For a 16-bit memory region, 
BEI, BE3, and BEO are encoded to provide 
Al, BHE, and BLE respectively. 


Ww/R Write or Read (output)—This signal is low 
for read accesses and high for write access- 
es. - 

ADS Address Strobe (output)—Indicates valid 


address and the start of a new bus access. 
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Data Transmit or Receive (output)— 
Direction control for data transceivers; 
similar to W/R. 


Data Enable (output)—Low during a 
bus request after the first address cy- 
cle. This signal is used to control data 
transceivers and to indicate the end of 
a bus request. 


Wait (output)—Indicates that wait 
states are being inserted by the internal 
wait state generator. —— 


Ready (input)—Signals that data is 
valid for a read transfer or ends data 
hold for a write transfer. This function 
can be disabled for a memory region. 


Burst Terminate (input)—Terminates 
a burst access. Another address is gen- 
erated to complete the request when 
the signal is deasserted. This function 
can be disabled for a memory region. 


Data or Code (output)—Indicates a 
data transfer or a code fetch. 


DMA Access (output)—Indicates that 
a bus request was initiated by either 
the user program or the DMA. 


Supervisor Access (output)—Indicates 
that a bus access originated from a bus 
request issued in supervisor mode. 
This signal can be used to protect sys- 
tem data structures, or peripherals 


from errant modification by the user 


code. 


peux | 


ws [ 


A31:4,SUP 


St a eS 


BE3:0 


LOCK 


BLAST 


HOLD 


HOLDA 


BREQ 
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Lock (output)—Indicates that an 
atomic memory operation is in prog- 
ress. This signal can be used to inhibit 
external agents from modifying memo- 
ry which is atomically accessed. 


Burst Last (output)—Indicates the last 
transfer in a burst access. 


Hold (input)—HOLD can be used by 
a bus requester to request access to the 
bus. The processor asserts HLDA af- 
ter the current bus request or locked 
requests have completed. 


Hold Acknowledge (output)—Indi- 
cates to a bus requester that the proc- 
essor has relinquished control of the 
bus. 


Bus Request (output)—Indicates that 
requests are queued in the bus control- 
ler and are waiting to be serviced. 
BREQ can be used for external bus ar- 
bitration logic in conjunction with 
HOLD and HLDA to regain bus mas- 
tership. 


Figure 4-1 shows the timing.for a simple, non-burst, 
non-pipelined read and write access. The timing rela- 
tions for the key control signals are shown in this fig- 


ure. 
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Figure 4-1. Basic Read and Write Request 
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4.2.4 MEMORY REGION CONFIGURATION 
TABLE 


The BCU can be saree differently for 16 separate 
sections (referred to as regions) of the address space. 
The four most significant bits of a memory address de- 
fine the location of each region in memory. The bus 
characteristics in a region are specified in the memory 
region configuration table. When a bus request is serv- 
iced, the BCU accesses the configuration table entry for 
the region addressed and services the request based on 
the bus characteristics programmed for that region. 
The characteristics ee armee for each region are 
listed below: 


— Burst Mode (on/ off) 


— Wait States 
(5 parameters) 

— Bus Width | 
(8-, 16-, or 32-bit) | 


— Ready Inputs (on/off) 


— Address je ite 
(on/off) 


= Byte Ordering 
(Big/Little. Endian) 


_ The flexibility of region programming simplifies the bus 

_ interface in applications where a memory system is 
made up of a variety of sub-systems, such as SRAM, 
DRAM, ROM, and memory mapped peripherals. Each 
memory sub-system can be mapped into a different re- 
gion in memory, and that region can be configured spe- 
cifically for the requirements of the particular memory 
sub-system. 


MEMORY REGION 
Region CONFIGURATION TABLE 


oO 


Reserved 


1 
2 
3 
4 
5 
6 
7 
8 
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The configuration table is made up of 16 on-chip con- 
trol registers (Figure 4-2). Each register is programmed 
with the configuration information for a single region. 
Since the region table is located on-chip, access to re- 
gion information does not affect the performance of the 
bus. 


4.2.4.1 Burst Accesses 


The 80960CA BCU is capable of burst accesses to 
memory systems which are designed to support this fea- 
ture. Burst mode is intended to get the most perform- 
ance from low cost memory systems. A burst access is a’ 
single address cycle followed by successive data or in- 
struction transfers. The transfers reference data or in- 
structions at sequential addresses starting at the address 
which began the burst access (Figure 4-3). In a burst 
memory system, the upper 28 bits of an address remain 
fixed while the lower two bits A2 and A3 increment to 
access subsequent locations. 


Wait state timing for the first access of a burst request 
is controlled independently from the timing for subse- 
quent accesses. A memory sub-system using static col- 
umn mode or page mode DRAMs, for example, can 
take advantage of the short column access times for 
these devices by using burst mode. Interleaved ROM or 
EPROM systems can also be constructed which simul- 
taneously access several words and then use burst mode 
to multiplex the multi-word array onto the data bus. 


REGION TABLE ENTRY 


21201918 1716151413121110 9876543210 


BURST ENABLE 
READY ENABLE 
PIPELINE ENABLE 


BYTE ORDER 
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Figure 4-2. Memory Region Configuration Table 
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Read Request: 
Clock 


Address 


Data -------------{ }-----{_ }-----(_}-----(_} --------------- 


Data Transfer & =] [} 


Bus Access 
270669-21 


Read Request: 


N =2 N =2 N =:-2 N = 3 

Wait State RDD ROD RDD XDA | 
Counter 

Clock 


Address 


Dota | Eg TON a, ON ap 


Data Transfer 


C)} 


Bus Access 
270669—-23 


Write Request: 


Nwop = 2 | Nwop = 2 
Wait State : 
Counter 


“Clock 


Data Transfer 


Bus Access 
2/0669-—24 


Figure 4-4. Programmable Wait States 
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4.2.4.2 Programmable Wait State Generation 


The 80960CA may be interfaced with a variety of mem- 
ory sub-systems and peripherals with a minimum sys- 
tem cost and complexity. To achieve this interface flexi- 
bility, the 80960CA implements an internal program- 
mable wait state generator. Internally generated wait 
states eliminate the potential system delays which come 
from generating wait states with external logic. 


Wait states are programmed for each region in the 
memory region configuration table. The number of wait 
states is programmable over a range which allows effi- 
cient control of memory devices ranging from ultra-fast 
SRAMs to slow peripherals. An external ready signal is 
also provided for external wait state control. 


The wait states which can be generated by the 


80960CA are shown in Figure 4-4. In this table N is the 


number of wait states inserted. The wait states for read 
- accesses and for write accesses are described by three 
parameters each. For read accesses, Npap is the num- 
ber of states between the address cycle and the first 
data cycle and Nrpp is the number of states between 
consecutive data cycles in a burst access. For writes, 
Nwap is the number of states that data is held after an 
address cycle, and Nwpp is the number of states that 
data is held for consecutive data cycles in a burst write. 
For both reads and writes, Nxpa is the number of 


dead cycles after the last data cycle and before the next 


address. 


4.2.4.3 READY Control - 


The memory region configuration table allows the 
ready input (READY) to be enabled or disabled for 
each region. If the ready input is disabled, the external 
input has no effect on the wait states generated for a 
memory access; all wait states are generated internally. 
If the ready input is enabled, it works in conjunction 
with the programmable wait state generator. In this 


Wait State 
Counter 


Clock 


Address 


Data 


Data Transfer 


. Bus Access 
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Figure 4-5. Pipelined Read Request 
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case, the ready input has no effect until the number of 


programmed wait states has expired. When the wait 
state counter reaches 0, the ready input is sampled, and 
wait states continue or are terminated based on the val- 
ue of the ready input. In order to gain complete exter- 


nal control over wait states, all wait state parameters 


for a region can be set to 0. 


4.2.4.4 Pipelined Reads 


The 80960CA BCU provides an address pipelining 
mode (Figure 4-5) to optimize the performance of in- 
struction and data fetches from external memory. 
When the pipelined read mode is enabled, an address 
cycle overlaps with the last data cycle in each access, 
effectively reducing the total time needed for each ac- 
cess. Pipelining mode is selected in each region by pro- 
gramming the memory region configuration table. 


4.2.4.5 Byte Ordering 


One of two configurations for byte ordering, often re- 
ferred to as little endian or big endian, is selected for 
each region by programming the memory region con- 
figuration table. The byte ordering options make the 
80960CA capable of sharing memory with a processor 
which uses either byte ordering scheme. Byte ordering 


_ refers to the way that the 80960CA relates internal data 
to the way that data is stored or fetched from memory. 


The little endian configuration orders the bytes in a 
short-word or word so that the least significant byte of 
the quantity is positioned at the lowest address and the 
most significant byte at the highest address in memory. 
Conversely, for the big endian configuration, the least 
significant byte is positioned at the highest address, and 
the most significant byte at the lowest address. For ex- 
ample, for little endian ordering, byte 0 for word data 
would be found in memory at an address of the form 
XXXX XXXOH and, for big endian, at address XXXX 
XXX3H. 


270669-—25 


3-156 


intel. 


4.2.4.6 Data Alignment 


The 80960CA can service any aligned or non-aligned 
bus request. Aligned requests are directed to their natu- 
ral boundary in memory. In other words, the addresses 
for aligned requests are even multiples of the length of 
the data transferred: Non-aligned requests are not serv- 
iced directly by the BCU but are assisted by microcode. 
Microcode automatically breaks non-aligned requests 
into multiple aligned requests which are then reissued 
to the BCU. Depending on the degree of non-alignment 
and the length of the original request, the resulting re- 
quests by microcode will consist of a combination of 
byte, short-word, and double-word requests. The BCU 
is able to generate an operation-unaligned fault when a 
non-aligned bus request is first received. This fault can 
be selectively masked at initialization. 


4.3 DMA Controller 


The DMA controller is a high-performance, full-func- 
tioned integrated peripheral. The DMA controller can 
manage 4 channels of DMA transfer concurrent with 
program execution. Separate external control for each 
channel is provided. Each channel supports high-per- 
formance memory to memory transfers where the 
source and destination can be any combination of inter- 
nal data RAM or external memory. The DMA Con- 
troller supports various types of transfers such as high- 
speed fly-by transfers and data chaining with the use of 
linked descriptor lists in memory. 


The 80960CA’s DMA controller is implemented using 
dedicated hardware and microcode. Because of the effi- 
ciency of the core, it is possible for the microcode to 
execute DMA transfers at high speeds. DMA transfers 
are performed by the core concurrently with execution 
of the user’s program. Internal DMA logic is used for 
sampling requests, synchronizing transfers with exter- 
nal devices, and handling the service of multiple active 
channels. 


4.3.1 SIGNAL DESCRIPTIONS 


Twelve pins are dedicated to the DMA controller. 
Three pins are associated with each. DMA channel. 
These pins are described below. In this description, the 
pin number corresponds to the channel number. For 


example, the DREQO pin is the request pin for — 


channel 0. 
DREQ3- DMA Request (input)—This input in- | 
DREQO dicates that an external device is re- 


questing a DMA transfer. A DMA 


transfer refers to the complete transfer | 


of one byte, short-word, word, or quad- 
word, depending on the transfer data 
width selected for the channel. 
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DACK3-—- 


DACKO DMA _ Acknowledge (output)—This 


output becomes active when the re- 
questing device is accessed. 


EOP3/TC3— End of Process (input) or Terminal 


EOPO/TCO Count (output)—This pin functions ei- 
ther as an input (EOPx) or as an output 
(TCx). When programmed as an out- 
put, the pin is driven active for one 
clock after byte count reaches zero and 
a DMA terminates. When programmed 
aS an input, an external device can 
cause the DMA operation to terminate. 


4.3.2 DMA TRANSFERS 


The 80960CA DMA controller supports a variety of 
transfer modes and variations of these modes, allowing 
the DMA to adapt to a number of hardware systems 
and the performance requirements of these systems. 


4.3.2.1 Standard Block and Demand Mode 
Transfers 


A standard DMA transfer is made up of multiple bus 
requests. Loads from a source address are followed by 
stores to a destination address. The DMA controller 
issues the proper combination of these bus requests to 
execute the DMA transfer. For example, a typical 
DMA transfer between memory and an 8-bit peripheral 
could appear as a single byte load request directed to 
the source memory, followed by a single byte store re- 
quest directed to the 8-bit peripheral. 


~The DMA controller has two basic transfer modes: 


block mode (unsynchronized) and demand mode (syn- 


- chronized). Any DMA transfer will be serviced by one 


of these basic transfer modes. 


A block mode DMA is initiated by software. Block 
mode DMAs are generally between memory. Block 
mode DMA transfers are not synchronized with any 
type of request from an external device. Once the DMA 
begins, it will continue until the entire block is com- 
plete or until it is suspended. The source and destina- 
tion addresses for block mode transfers can be incre- 
mented or held constant for a DMA. 


A demand mode DMA is controlled by an external 
device. Demand mode DMAs are generally between an 
external device and memory. In demand mode, each 
individual DMA transfer can be synchronized with a 
request. The request is signalled when an external de- 
vice activates a DMA channel request pin (DREQ3- 
DREQO). The DMA controller acknowledges this re- 
quest with the DMA acknowledge pin (DACK3- 
DACK0O) when the requesting device is accessed. A de- 
mand mode transfer may be synchronized with either 
the source or the destination device. 
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_ 4.3.2.2 Fly-by Transfers 


A fly-by transfer mode is provided for the most per- 
formance-critical DMA applications. Fly-by mode also 
makes very efficient use of the external bus during a 
DMA. Standard DMA transfers involve multiple bus 
requests: load requests directed to the source and a 
store request directed to the destination. Fly-by trans- 
fers only require a single bus request. For a fly-by trans- 
fer, memory sees a load or a store on the bus while the 
requesting device is selected by the DMA acknowledge 
pin. The data is never actually read from or written to 


the 80960CA. For memory to device transfers, the. 
processor issues a load, and, while reading the memory, 


accesses the external device with the DMA acknowl- 
edge pin. The data is then written directly to the desti- 
nation device with a single bus request. For a device to 
memory transfer, the reverse operation is performed. 
The DMA issues a store, and, while writing the memo- 
ry, accesses the source device with the DMA acknowl- 
edge pin. In this case, the processor floats the data bus 
and the device’s data is written directly into memory. 


4.3.2.3 Data Chaining 


Each DMA channel can be Sean in a data . 


chaining mode. In this mode, all transfer information is 
taken from a linked-list descriptor in memory (Figure 
4-6). Data chaining is started by specifying a pointer to 
a descriptor in memory. The transfer continues until 


\ 


Internal Register 


First Descriptor Pointer 


{ Descriptors 


User Loads 


BC = Byte Count 
SA = Source Address 


DA = Destination Address ec [ sA [DAZ NPIR| 


NGI Next Pointer 


Not Used For Source Chaining 


Terminate 
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the number of bytes in the byte count field in the de- 
scriptor is transferred. At this time, another linked-list 
descriptor may be executed. The next descriptor is 
specified by the next-pointer field in the current de- 
scription. Data chaining continues until a null pointer 
is encountered in the next-pointer field. Data chaining 
can be designated as source UaMnENe destination chain- 
ing, or both. 


In data chaining mode, an option exists which allows 
chaining descriptors to be updated while the DMA is 
running. When this option is enabled, the DMA sets a 
bit in the DMA’s special function register after loading 
a descriptor and then checks this bit before loading the 
next descriptor. If the bit has been cleared by the user, 
the DMA continues; otherwise, the DMA waits for the 
next descriptor to be set up and for the user to clear the 
bit. An interrupt can be generated when each buffer is 
complete or when the DMA is terminated with a oul 
pointer or the EOP pin. 


4.3.3 TRANSFER CHARACTERISTICS 


The DMA controller provides the programmer with a 
number of options for configuring the characteristics of 
a DMA transfer. Intelligent selection of transfer char- 
acteristics works to balance DMA performance and 
functionality with performance of the user program 


when-the DMA is in progress. 


- Source Buffers 


| Destination Buffer 


8 ae 9 
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Figure 4-6. Source Data Chaining 


| 
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The DMA controller provides features to optimize 
transfers by moving a maximum amount of data for 
each bus request issued. This is controlled by specifying 
the width of the source and destination directed bus 
requests for a DMA transfer, and by on-chip assembly 
or disassembly of the transfer when source and destina- 
tion are not of equal widths. | 


Data alignment is performed automatically by the 
DMA controller when the source and destination of a 
transfer are not aligned. The alignment algorithm is 
optimized for many transfers, providing a performance 
comparable to the aligned transfer cases. 


4.3.3.1 Transfer Data Length 


The transfer data length specifies the length of bus re- 
quests directed to the source and destination in a stan- 
dard DMA transfer. Byte, short, word, or quad-word 
loads and stores are selected for either source or desti- 
nation when a DMA channel is set up. Assembly and 
disassembly of data is automatically performed when 
the source and destination widths are different. This 
feature provides the most efficient use of the bus when 
DMA transfers occur between a source and a destina- 
tion with different external bus widths. 


Pope 
re 
i op “4 ao 


a. 
- | ee 
Region U/, Y/, Y//, ae 
me... 


Destination 
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Address 

0000 0200H 
0000 0204H 
0000 0208H 


0000 020CH 


0000 0304H 
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The DMA controller provides the option of using quad 
word transfers to enhance DMA performance. When 
quad transfers are specified, the DMA will request a 
four-word load request and four-word store request for 
each DMA transfer. The trade-off for the added DMA 
performance is latency on the external bus, preventing 
requests by the core, or by another DMA channel from 
being immediately serviced. 


4.3.3.2 Data Alignment 


The DMA controller supports transfer of source and 
destination data aligned to different byte boundaries in 
memory. The DMA implements microcode algorithms 
to transfer some non-aligned data with a performance 
level approaching that for aligned transfers. The DMA 
accomplishes this by attempting to tgssue the maximum 
number of aligned bus requests during a DMA (Figure 
4-7). As shown, most of the overhead due to non- 
aligned DMAs is incurred at the beginning and end of 
the DMA. DMAs with low byte counts, therefore, do 
not benefit as much from the data alignment features of 
the DMA. The alignment feature is optimized for 8-bit 
to 8-bit, 32-bit to 32-bit and for 8-bit and 32-bit combi- 
nations of source and destination lengths. 


Bus Operation 


Access ; - 


load__word | 0000 0200H 
| store__byte | 0000 0303H 
load__word | 0000 0204H 
store__word | 0000 0304H 
|load__word } 0000 0208H 
store__word | 0000 0308H 
load__word | 0000 020CH 
store__short | 0000 030CH 
store__byte | 0000 O3OEH 


oom 


OOnN OD OAR WP 


0000 0300H 


Byte Number. 
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0000 0308H 


0000 030CH 
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Figure 4-7. DMA Data Alignment 
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4.3.3.3 Channel Priority 


The DMA controller arbitrates the priority of the 4 
DMA channels. If multiple DMA channels are en- 
abled, the DMA controller will determine in which or- 
der each channel is serviced. . 


The DMA controller can be configured in one of two 
priority modes, fixed mode or rotating mode. The fixed 


mode assumes a fixed priority for each channel with — 


channel 0 having the highest priority, followed by chan- 


nels 1, 2, and 3, with channel 3 having the lowest prior- _ 


ity. The rotating mode updates a channel’s priority to 
the lowest priority after that channel’s DMA is made. 
This insures that a single channel is never locked out by 
other active channels. The priority sequence is always 
in the same order, with priority rotating from the low 
channel numbers to the high channel numbers. 


4.3.3.4 Performance and Latency — 
Considerations 


DMA operations and the user program share the re- 
sources of the core and of the external bus. DMA_per- 
formance and the performance of the user program are 
coupled directly to the balance of load sharing between 
these two processes. The core resources necessary to 
perform a DMA transfer vary depending on the way a 
channel has been configured. For. example, byte assem- 
bly and disassembly requires more processor overhead 
per byte of transfer than does a transfer in which the 
source and destination transfer lengths are equal. The 
performance of a DMA is also tightly coupled to the 
user program’s use of the external bus. If the user pro- 
gram does not make frequent, bus requests, the requests 
by the DMA controller will be serviced with little or no 
delay. 


The user can enhance performance of the DMA with 
trade-offs in system complexity and flexibility. Aligned 
transfers eliminate the microcode overhead needed to 

perform the internal alignments. DMAs between re- 
- gions of equal transfer widths eliminate overhead for 
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assembly and disassembly. Source or destination mem- 


ory configured as burst memory will provide the most 


efficient use of the DMA controller when the quad- 
transfer feature is enabled. Using the fly-by mode re- 
duces the number of bus requests needed for a DMA 
since fly-by mode uses only a single load or a | single 
store request for each transfer. 


4.3.4 DMA CONTROL AND CONFIGURATION 


The DMA Controller uses an SFR register, the DMA 
command (DMAC) register, and the setup DMA 
(sdma) instruction for configuration and control of a 
DMA. The sdma instruction is used to configure each 
DMA channel. Transfer widths, byte count, source and 
destination addresses for a DMA are specified in this 
instruction. 


The DMAC register (Figure 4-8) is described. below. | 
The channel enable field enables a DMA once the 
channel is set up. Clearing these bits will also cause a 


DMaA transfer to be suspended. 


The terminal count field signals that byte count has 
reached zero and a DMA has ended. 


The channel active field indicates that a channel is idle 
or active. If set, this bit indicates that the channel is 


-active. This implies that the channel is servicing a 
_transfer or has a request pending! The active bits are 


status information only. 


The channel done field indicates that a DMA operation 


is complete. The done bits are status information only. 


The channel wait field is used for handshaking with a 
user program in data chaining mode. The DMA sets 
these bits when a new linked-list descriptor is read. The 
DMA will not read the next descriptor until this bit is 
cleared by the user. The user can set up the next de- 
scriptor and then clear the channel wait bits to dynami- 
cally change descriptors. 


313029 2827 26 25242322 21201918 17161514131211109 876543210 


~ Reserved 
(Initialize To 0) 


| i ae Channel Enable (3-0) 
7 Terminal Count (3-0) 
— Channel Active (3-0) 
Channel Done (3-0) 
Channel Wait (3-0) 
Priority Mode Bit 
Throttle Bit 
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Figure 4-8. DMA Command Register 
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A priority mode bit selects rotating or fixed priority 
mode. 


The throttle bit selects the maximum amount of core 
resources that the DMA microcode will receive in rela- 
tion to the execution of the user program. 


4.3.5 DMA INTERRUPTS 


The DMA controller is the source of 4 hardware inter- 
rupts in the 80960CA. The DMA Controller can be 
programmed to request an interrupt when a DMA is 
complete, or when a buffer transfer is completed in 
chaining mode. Each channel requests a different inter- 
rupt. 


4.4 Interrupt Controller 


The 80960CA Interrupt Controller manages interrupts 
which are requested by external agents or by the DMA 
Controller. The interrupt controller manages 4 internal 
DMA interrupt sources, a single NMI (Non-Maskable 
Interrupt) pin, and 8 external interrupt pins. Up to 248 
external interrupt sources can be supported by the in- 
terrupt controller. The interrupt controller handles the 
prioritization of software interrupts, hardware inter- 
rupts, and the process priority, and signals the core 
when interrupts are to be serviced. The interrupt con- 
troller provides the low-latency interrupt service. fea- 
tured on the 80960CA. 


4.4.1 EXTERNAL INTERRUPTS 


The 80960CA provides 8 interrupt pins and one NMI 
pin for detecting external requests. The interrupt con- 
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troller allows the 8 interrupt pins to be configured as 
dedicated inputs capable of requesting 8 interrupts, or 
as a vectored input capable of requesting up to 248 
interrupts. The NMI pin is always a dedicated input. 
The interrupt controiler pins are described below. . 


XINT7-_ External Interrupts (inputs)}—These pins 
XINTO can be used as dedicated inputs, or acting 
together as an 8-bit number, request any in- 

terrupt. The inputs are edge or level detect- 

ed, and are optionally debounced internally. 


NMI Non-Maskable Interrupt (input)—NMI re- 
quests the highest priority interrupt. NMI 
is always taken and is not maskable (as the . 
name implies), and not interruptable. 


4.4.2 INTERRUPT MODES 


The 8 external interrupt pins can be configured in one 
of three modes: dedicated mode, expanded mode, or 
mixed mode (Figure 4-9). 


4.4.2.1 Dedicated Mode Interrupts 


In dedicated mode, each of the 8 interrupt pins acts as a 
dedicated input. When an external event is detected on 
an interrupt pin, a unique interrupt is requested for that 
pin. It is possible to map each dedicated pin to one of a 
number of possible interrupt vectors. This is accom- 
plished by programming the interrupt map (IMAP) 
control registers with an interrupt vector number for 
each pin. (Recall that interrupt vector numbers are 
8-bit values which reference the 248 vectors in the in- 


_ terrupt table.) 
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Figure 4-9. Interrupt Modes 
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Only the upper four bits of the vector number can be 
programmed for a dedicated mode interrupt. The lower 
four bits are fixed at the value 0010). With four pro- 
grammable bits, one of 15 interrupt vectors is available 


for each dedicated pin. These interrupt vectors span the | 


even priority levels from priority 2 to 30. The vector at 
priority 0 i is not defined. 


The 15 interrupt vectors available to dedicated sources 
can be cached in internal data RAM. If this interrupt 
vector caching feature is selected, the processor will au- 
tomatically fetch the vector from data RAM, eliminat- 
ing the latency caused’ by a bus request for a vector in 
external meMOry: 


The DMA Controller can request four interrupts to sig- 
nal the end of a DMA for each of four channels. The 
four interrupt signals from the DMA are handled by 
the interrupt controller in the same way as an interrupt 
pin configured as a dedicated input. Each of the four 
DMA sources may request one of 15 interrupts by pro- 
gramming the IMAP for that source. 


4.4.2.2 Expanded Mode Interrupts 


In expanded mode, external hardware considers the in- 


terrupt pins (XINTO—XINT7) as an 8-bit binary num-. 


ber. This number is used directly as the interrupt vector 
number. Each of the 248 possible interrupt vectors can 
be referenced in this way, allowing a separate external 
source for. each vector. External hardware is responsi- 
ble for recognizing individual hardware sources and 
then driving the interrupt vector number corresponding 
to that source onto the interrupt pins. — 


4.4.2.3 Mixed Mode Interrupts 


In mixed mode, the 8 interrupt pins are divided into 
two functional sets. One set functions in dedicated 
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mode, the other in expanded mode. In mixed mode, 
three pins are dedicated interrupt pins (XINT7- 
XINTS5). A programmable vector number is associated 
with each of these pins. The remaining five interrupt 
pins (XINT4—XINTO) are treated as the most signifi- - 
cant five bits of the expanded mode vector number. The 
lower order bits are internally forced to 010) to form 
the full 8-bit value for the vector number. 


4.4.3 INTERRUPT CONTROLLER SETUP : 


The interrupt controller uses two special function regis- 
ters to manage interrupt requests by hardware sources. 
The hardware interrupt pending register (IPND) and 
the hardware interrupt mask register (IMSK) are ad- 
dressed as sf0 and sf1 respectively. A single bit in each 
register corresponds to each of the 8 possible external 
sources and 4 DMA sources for hardware interrupts. 
The IMSK register performs the function of masking — 
hardware interrupts and the IPND register implements 
posting of interrupts requested by hardware. When 
configured for expanded. or mixed mode interrupts, bit 
0 of the IMSK register globally ‘masks the sapanden 
mode interrupts. 


4.4.4. NON-MASKABLE INTERRUPT 


In addition to the maskable hardware interrupts, a sin- 
gle Non-Maskable Interrupt (NMI) is provided. A dedi- 
cated NMI pin is used to request this interrupt. NMI is 
defined as a higher priority than any hardware inter- 
rupt, software interrupt, or process priority. The NMI 
procedure, therefore, can never be interrupted and 
must execute the return instruction before other proce- 
dures can execute. The NMI procedure is entered 
through vector 248. This vector is cached in internal 


_data RAM at initialization to reduce latency for the 


NMI. 
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APPENDIX A 
S0960CA CORE IMPLEMENTATION 


The 80960CA Core is a high-performance implementa- 
tion of the 80960 Core Architecture. This section brief- 
ly describes the microarchitecture of the 80960CA core 
and the key constructs used to achieve parallel instruc- 
tion execution. 


The 80960CA core can be divided into the 6 main sub- 
units listed below. 


— Instruction Sequencer 

— Register File 

—- Execution Unit 

— Multiply and Divide Unit 

— Address Generation Unit 

— Static Data RAM and Local Register Cache 


Figure A-1 is a simple block diagram of the 80960CA. 
The nucleus of the processor is the Instruction Se- 
quencer and Register File. The other subunits of the 
core, referred to as coprocessors, radiate from these 
units, connecting to either the register (REG) side or 
the memory (MEM) side of the processor. The Instruc- 
tion Sequencer issues directives, via the REG and 
MEM interfaces, which target a specific coprocessor. 
That coprocessor then executes an express function vir- 
tually decoupled from the IS and the other coproces- 


sors. The REG and MEM data busses shown in Figure 
A-1| are used to transfer data between the common 
Register File and the coprocessors. 


A.1 Instruction Sequencer 


The Instruction Sequencer (IS) decodes the instruction 
stream and drives the decoded instruction stream onto 
the coprocessor interfaces. In a single clock, the IS de- 
codes up to 4 instruction and issues up to three of these 
instructions to the on-chip coprocessors or to the IS 
itself. One register (REG) format, one memory (MEM) 
format, and one control or control and branch (CTRL 
or COBR) format instruction can be issued at one time. 
These instructions are directed respectively to the REG 
coprocessors, the MEM coprocessors, or to the IS. The 
ability to issue multiple instructions in parallel can re- 
sult in the simultaneous execution of many instructions 
at once. An optimizing compiler or hand optimization 
of assembly code can easily produce an instruction 
stream which takes full advantage of the paras! execu- 
tion of the core. 


A technique known as resource scoreboarding is used to 
manage the parallel execution of instructions and the 
common resources of the processor. A coprocessor, for 
example, can scoreboard itself, indicating that it cannot 
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act on another instruction until an instruction currently 
executing on that coprocessor is completed. A specific 
form of resource scoreboarding is referred to as register 
scoreboarding. When the computation stage of an in- 
struction takes more than one clock, the destination 
register or registers for the result are scoreboarded as 
busy. A subsequent operation needing that particular 
register will be delayed until the multi-clock operation 
- is completed. Instructions which do not use the score- 
boarded registers can be executed in parallel. 


The IS manages a three stage parallel instruction pipe- 
line (Figure A-2). In the first stage of the pipeline (pipe 
0), the address of the next instruction is calculated. 
This address may be the next sequential instruction, the 
target of a branch, or a location in microcode. In the 
second stage of the pipeline (pipe 1), the instructions 
are issued to the rest of the machine. In the third stage 


(pipe 2), the instruction computation is started, and for 


single cycle instructions, a result is returned. 


Several microarchitectural features of the core are de- 
signed to minimize performance loss due to pipeline 
breaks. | 


Branch Prediction—To minimize pipeline breaks due to 
branching, the user can specify the direction that a con- 
ditional branch instruction will usually follow. The 
processor will execute along the specified instruction 
path with no pipeline break. If the branch direction 
specified was the direction actually selected by execu- 
tion of the conditional branch, no pipeline break oc- 
curs. The direction of the branch guess is determined 
by a bit value in the CTRL format instructions. | 


Register Bypassing—Register bypassing is a feature 


which forwards the result of an instruction for immedi-_ 


ate use as the source of another instruction. This for- 
warding occurs at the same time that the value is writ- 


State 1 


Pipe 0 
Pipe 1 


Pipe 2 
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ten to its destination register. Bypassing the register file 
saves the one clock cycle break which would otherwise 
occur while waiting for the value to be written to the 
register file and the register scoreboard to be cleared. 


On-chip Cache—The on-chip instruction cache and lo- 
cal register cache eliminate many pipeline breaks which 
will occur if the IS is forced to wait for code or data to 
be moved between the 80960CA and external memory. 


Register File Access—The Register File allows multiple 
instructions to gain access to the register set simulta- 
neously. This eliminates pipeline breaks which would 
be caused by a loss of access to the register set by any 
coprocessor. 


A.1.1 INSTRUCTION CACHE 


The IS includes a 1 Kbyte two-way set associative in- 
struction cache capable of delivering up to four instruc- 
tions each clock to the Instruction Sequencer. The 
cache allows inner loops of code to execute with no 
external instruction fetches. | ; 


A.1.2 MICROCODE ROM 


The 80960CA uses microcode ROM to implement com- 
plex instructions and functions. This includes calls, re- 
turns, DMA transfers, and initialization sequences. Mi- 
crocode provides an inexpensive and simple method for 
implementing complex instructions in the mostly RISC 


-_ environment of the 80960CA. When the IS encounters 


a microcoded instruction, it automatically branches to 
the microcode routine. The 80960CA performs this mi- 
crocode branch in 0 clocks. 


2 3 


return 


Figure A-2. Instruction Pipeline 
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A.2 Register File 


The Register File (RF) contains the 16 local and 16 
global registers. The register file has six ports (Figure 
A-3), allowing parallel access of the register set by sev- 
eral 80960CA coprocessors. This parallel access results 
in an ability to execute one simple logic or arithmetic 
instruction, One memory operation (load/store), and 
one address calculation per clock. 


MEM coprocessors interface to the RF with a 128-bit 
wide load bus and a 128-bit wide store bus. These bus- 
ses enable movement of up to 4 words per clock to and 
from the RF. These busses also allow LOAD data from 
a previous read access and STORE data from a current 
write access to be processed in the register file simulta- 
neously. An additional 32-bit port allows an address or 
address reduction operand to be simultaneously fetched 
by the Address Generation Unit. 


REG coprocessors interface to the RF with two 64-bit 
source busses and a single 64-bit destination bus. With 
this bus structure, two source operands are simulta- 
neously issued to a REG coprocessor when an instruc- 
tion is issued. A 64-bit destination bus allows the result 
from the previous operation to be written to the RF at 
the same time that the current operation’s source oper- 
ands are issued. oe 


A.3 Execution Unit. 


The Execution Unit is the 32-bit Arithmetic and Logic 
Unit of the 80960CA Core. The EU can be viewed as a 
self-contained REG coprocessor with its own instruc- 
tion set. As such, the EU is responsible for executing or 
supporting the execution of all the integer and ordinal 
arithmetic instructions, the logic and shift instructions, 
the move instructions, the bit and bit field instructions, 
and the compare operations. The EU performs any 
arithmetic or logical instructions in a single clock. 
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A.4 Multiply Divide Unit 


The Multiply and Divide Unit (MDU) 1s a REG coproc- 
essor which performs integer and ordinal multiply, di- 
vide, remainder, and modulo operations. The MDU de- 
tects integer overflow and divide by zero errors. The 
MDU is optimized for multiplication, performing 32- 
bit multiplies in 4 clocks. The MDU performs multi- 
plies and divides in parallel with the main execution 
unit. 


A.5 Address Generation Unit 


The Address Generation Unit (AGU) 1s a MEM coproc- © 
essor which computes the effective addresses for memo- 
ry operations. It directly executes the load address in- 
struction (Ida) and calculates addresses for loads and 
stores based on the addressing mode specified in these 
instructions. The address calculations are performed in 
parallel with the main execution unit (EU). 


A.6 Data RAM and Local Register 
Cache | 


The Data RAM and Local Register Cache is part of a 


1.5 Kbyte block of on-chip Static RAM (SRAM). 


1 Kbyte of this SRAM is mapped into the 80960CA’s 
address space from _ location OOQQ0Q0000H to 
000003FFH. A portion of the remaining 512 bytes is 
dedicated to the Local Register Cache. This part of 
internal SRAM is not directly visible to the user. Loads 
and Stores, including quad-word accesses, to the inter- 
nal SRAM are typically performed in only one clock. 
The complete local register set, therefore, can be moved 
to the local register cache in only four clocks. 


YU, Six-Port Register File // 
UY, Six-Port Register File /// 
i; . 16 Local Registers 


16 Global Registers 


REG DATA 
BUSSES 


MEM DATA 


BUSSES 
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Figure A-3. Six-Port Register File 
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80960CA-33, -25, -16 
—32- BIT HIGH PERFORMANCE EMBEDDED PROCESSOR 


¢ Two Instructions/Clock Sustained Execution 
° Four 59 Mbytes/s DMA Channels with Data Chaining 
e Demultiplexed 32-Bit Burst Bus with Pipelining 


32- bit Parallel Architecture | 

— Two Instructions/clock Execution 
— Load/Store Architecture 

— 16, 32-bit Global Registers 

— 16, 32-bit Local Registers 

— Manipulate 64-Bit Bit Fields 

— 11 Addressing Modes 

— Full Parallel Fault Model 

— Supervisor Protection Model 


Fast Procedure Call/Return Model 


* — Full Procedure Call in 4 clocks 


— RISC Call in 2 clocks (BAL) 


On-Chip Register Cache 

— Caches Registers on Call/Ret 

— Minimum of 6 Frames provided 

— Number of Frames Programmable, 
up to 15 | 


On-Chip Instruction Cache 

— 1 Kbyte Two-Way Set Associative 

— 128-bit Path to Instruction Sequencer 
— Cache-Lock Modes 

— Cache-Off Mode 


High Bandwidth On-Chip Data Ram 
— 1 Kbytes On-chip RAM for Data 
— Sustain 128-bits per clock access 


Four On-Chip DMA Channels 

— 59 Mbytes/s Fly-by Transfers 

— 32 Mbytes/s Two-Cycle Transfers 
— Data Chaining | 

— Data Packing/Unpacking © 

— Programmable Priority Method 


32-Bit Demultiplexed Burst Bus | 

— 128-Bit Internal Data Paths to and 
from Registers |. 

— Burst Bus for DRAM Interfacing 

— Address Pipelining Option 

— Fully Programmable Wait States — 

— Supports 8, 16 or 32-bit Bus Widths 

— Supports Unalianed Accesses 

— Supervisor Protection Pin 


High-Speed Interrupt Controller 
— Up to 248 External Interrupts 
— 32 Fully Programmable Priorities 
— Multi-mode 8-bit Interrupt Port | 


— Four Internal DMA Interrupts | | 


— Separate, Non-maskabie Interrupt Pin 
— Context Switch in 750 ns Typical 
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1.0 PURPOSE 


This document provides a preview of the electrical 
characteristics expected of the 33, 25 and 16 MHz 
versions of the 80960CA. For a detailed description 
of any 80960CA functional topic, other than para- 
metric performance, consult the latest 80960CA 
Product Overview (Order No. 270669), or the 
80960CA User’s Manual (Order No. 270710). 


2.0 80960CA OVERVIEW 


The 80960CA is the second-generation member of 
the 80960 Family of embedded processors. The 
80960CA is object code compatible with the 32-bit 
80960 Core Architecture while including Special 
Function Register extensions to control on-chip pe- 
ripherals, and instruction set extensions to shift 64- 
bit operands and configure on-chip hardware. Multi- 
ple 128-bit internal busses, on-chip instruction cach- 
ing and a sophisticated instruction scheduler allow 
the processor to sustain execution of two instruc- 
tions every clock, and peak at execution of 3 instruc- 
tions per clock. : 


INSTRUCTION CACHE 


PROGRAMMABLE 
INFEBRC lat INTERRUPT CONTROLLER 
sil 


moat aad 


64-BIT 
SRC1 BUS SIX-PORT 
y REGISTER FILE 


64-BIT 
a BUS 


64-BIT 
DST BUS 
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PARALLEL 
INSTRUCTION SCHEDULER 


Register-side ] | Memory-side 
Machine Bus Machine Bus 
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A 32-bit demultiplexed and pipelined burst bus pro- 
vides a 132 Mbyte/s bandwidth to a system’s high- 
speed external memory sub-system. In addition, the 
80960CA’s on-chip caching of instructions, proce- 
dure context and critical program data substantially 
decouples system performance from the wait states 
associated with accesses to the system’s slower, 
cost sensitive, main memory sub-system. 


The 80960CA bus controller also integrates full wait 
state and bus width control for highest system per- 
formance with minimal system design complexity. 
Unaligned access and Big Endian byte order support 
reduces the cost of porting existing applications to 
the 80960CA. — 


The processor also integrates four complete data- 
chaining DMA channels and a high-speed interrupt 
controller on-chip. The DMA channels perform: sin- 
gle-cycle or two-cycle transfers, data packing and 
unpacking, and data chaining. Block transfers, in ad- 
dition to source or destination synchronized trans- 
fers are provided. | 


The interrupt controller provides full Sioalanmnability : 
of 248 interrupt sources into 32 priority levels with a 
typical interrupt task switch (‘latency’) time of 
750 ns. 


| _ DMA PORT 
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Figure 2. 80960CA Block Diagram 
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2.1. The C- Series Core 


The C-Series core is a very high ‘enone micro- 
architectural implementation. of the 80960 Core Ar- 
chitecture. The C-Series core can sustain execution 
of two instructions per clock (66 MIPs at 33 MHz). 
To achieve this level of performance, Intel has incor- 
porated state-of-the-art silicon technology and inno- 
vative microarchitectural constructs into the imple- 
mentation of the C-Series core. Factors that contrib- 
ute to the core’s performance include: 


_ Parallel instruction ‘decoding allows issue of up 
to three instructions per clock. — 


— Most instructions execute in a single clock. . 


— Parallel instruction decode allows sustained, 
~ simultaneous execution. of two single- -clock in- 
-. structions every clock cycle. | 


— Efficient instruction pipeline is designed io mini- 
. mize pipeline break losses. 


— Register and resource. scoreboarding “allow 
simultaneous multi-clock instruction execution. 


— Branch look-ahead and prediction allows many 
branches to execute with no pipeline break. - 


— Local Register. Cache integrated on- chip caches 
Call/Return context. 


— Two-way set associative, 1Kbyte integrated in- 
struction cache 


— 1Kbyte integrated Data RAM sustains a four- 
word (128-bit) access every clock cycle. 


2.2. Pipelined, Burst Bus 


A 32-bit high performance bus controller interfaces 


the 80960CA to external memory and peripherals. 


The Bus Control Unit features a maximum transfer 
rate of 132 Mbytes per second (at 33 MHz). Internal- 
ly programmable wait states and 16 separately con- 
figurable memory regions allow the processor to in- 
terface with a variety of memory subsystems with a 
minimum of system complexity, and a maximum of 
performance. The Bus Cenironey: Ss main features in- 
clude: | | 
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— Demultiplexed, Burst Bus to exploit most efficient 
DRAM access modes 


— Address Pipelining to reduce memoly, cost we 
maintaining performance 


— 32-, 16- and 8-bit modes for 1/0 interfacing ease. 


— Full internal wait state generation t to aes SySs- 
tem cost. 


— Little and Big Endian support to ease application 
development 


— Unaligned access support for code portability _ 


— Three-deep request queue to decouple the bus 
from the core 


— Direct interface to Intel’s 27C960 Burst EPROM 
_and 82596 Ethernet Controller. 


2. 3. Flexible DMA Controller 


A four channel DMA controller provides high speed 
DMA control for data transfers involving peripherals 
and memory. The DMA provides advanced features 
such as data chaining, byte assembly and disassem- 
bly, and a high performance fly-by mode capable of 
transfer speed of up to 59 Mbytes per second at 
33 MHz. The DMA controller features a performance 


-and flexibility which is only possible by eee 


the DMA controller and the 80960CA core. 


2.4. Priority Interrupt Controller 


A programmable-priority interrupt controller man- 
ages up to 248 external sources through the 8-bit 
external interrupt port. The Interrupt Unit also han- 
dles the 4 internal sources from the DMA controller, 
and a single non-maskable interrupt input. The 8-bit 


_ interrupt port can also be configured to provide indi- 


vidual interrupt sources that are level, or edge trig- 


: gered. 


| Interrupts in the 80960CA are ‘prioritized and sig- 


naled within 270 ns of the request. If the interrupt is 
of higher priority than the processor priority, the con- 
text switch to the interrupt routine typically is com- © 
plete in another 480 ns. The interrupt unit provides 
the mechanism for the low latency and high through- 
put interrupt service wnten) is essential for embedded 
applications. 
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2.5. Instruction Set Summary — 


The following table summarizes the 80960CA instruction set by logical groupings. See the 80960CA User’s 
Manual for a complete description of the instruction set. 


- Data Logical Bit, Bit Field 
Movement | : and Byte 


Load Add 
Store Subtract 
Move — Multiply 
Load Address Divide 


Remainder 


Modulo 
Shift 


*Extended 


Shift — 


And 

Not And 

And Not 

Or 

Exclusive Or 
Not Or 

Or Not 

Nor — 
Exclusive Nor 


Set Bit 

Clear Bit 

Not Bit 

Alter Bit 

Scan for Bit 
Span over Bit | 
Extract 

Modify 


Scan Byte for Equal 
Extended Not | 
Multiply Nand 
Extended 
Divide 
Add with 
Carry 
Subtract with 
Carry 
Rotate 


Compare Unconditional . Call | Conditional | 

Conditional Branch Call Extended Fault 
Compare Conditional: Call System Synchronize 

Compare and Branch Return Faults 
Increment . Branch and Link. : 

Compare and 7 
Decrement — ' Compare and 

Condition Test Branch 

Check Bit 


- Debug Processor Atomic. 
Management — 


Modify Trace Modify Atomic Add 
Controls Process Atomic Modify 
Mark Controls 
Force Mark Modify 
Arithmetic 
Controls 
*System Control 
*DMA Control — 
Flush Local 
Registers 


NOTE: 
Instructions marked by (*) are 80960CA extensions to the 80960 instruction set. 
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3.0 PACKAGE INFORMATION 


3.1. Package Introduction 


This section describes the pins, pinouts and thermal 


characteristics. for the 80960CA in the 168-pin Ce- 


ramic Pin Grid Array (PGA) package and the 196 pin © 


Plastic Quad Flat Package (PQFP). For complete 
package specifications and information, see the Intel 
Packaging Specification (Order # 231369). 


3.2. Pin Descriptions 


The 80960CA pins are described in this section. Ta- 
ble 1 presents the legend for interpreting the pin de- 
‘scriptions in the following tables. 


The pins associated with the 32-bit demultiplexed 
processor bus are described in Table 2. The pins 
associated with basic processor configuration and 
control are described in Table 3. The pins associat- 
ed with the 80860CA DMA Controller and Interrupt 
Unit are described in Table 4. 


Figure 3 provides an example pin description table 
entry. The “I/O” signifies that the data pins are in- 
put-output. The “S” indicates the pins are synchro- 
nous to PCLK2:1. The ‘“‘H(Z)” indicates that these 
pins float while the processor bus is in a Hold Ac- 
knowledge state. The “R(Z)” notation indicates that 
the pins also float while RESET is low. 


All pins float while the processor is in the ONCETM 
mode. i 


[Name | Type 


Peay es Input only pin 
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Table 1. Pin Description Nomenclature 


Output only pin | : | 


Pin can be either an input or output | 


Pins ‘‘must be” connected as 
described 


Synchronous. Inputs must meet setup 
and hold times relative to PCLK2:1 for 
proper operation of the processor. All 
outputs are synchronous to PCLK2:1. 
S(E) Edge sensitive input 
-S(L) Level sensitive input 


Asynchronous. Inputs may be 
asynchronous to PCLK2:1. 

A(E) Edge sensitive input 
A(L) Level sensitive input 


While the processor's bus is in the 
Hold Acknowledge or Bus Backoff 
state, the pin: 

H(1) isdrivento Vcc 

H(0) is driven to Vss 
H(Z) floats» 
H(Q) continues to be a valid output 
While the processor’s RESET pin is 
low, the pin ee 
~ R(1) is driven to Vcc 
~ R(O) is driven to Vss__ 
R(Z) floats 3 | 
R(Q) continues to be a valid output 


Description 


DATA BUS carries 32, 16 or 8-bit data quantities depending on bus width configuration. The 


least significant bit of the data is carried on DO and the most significant on D31. When the 
bus is configured for 8 bit data, the lower 8 data lines, D7:0 are used. For 16 bit data widths, 


D15:0 are used. For 32 bit data the full data bus is used. - | 


Figure 3. Example Pin Description Entry 
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- Table 2. 80960CA Pin ee Bus Signals 


ee a 


ADDRESS BUS carries the upper 30 bits of the physical address. A371 is the most. 
significant address bit and A2 is the least significant. During a bus access, A31:2 
identify all external addresses to word (4-byte) boundaries. The byte enable © 
signals indicate the selected byte in each word. During burst accesses, A3 and A2 
increment to indicate successive data cycles. 


DATA BUS carries 32, 16 or 8-bit data quantities depending on bus width 
configuration. The least significant bit of the data is carried on DO and the most 
significant on D381. When the bus is configured for 8 bit data, the lower 8 data 

lines, D7:0 are used. For 16 bit bus widths, D15:0 are used. For 32 bit bus widths 
the full data bus is used. 


BYTE ENABLES select which of the four bytes addressed by A31:2 are active 
during an access to a memory region configured for a 32-bit data-bus width. BE3 
applies to D31:24; BE2 applies to D23:16; BE1 applies to D15:8; and BEO apples 
to D7:0. 


32-bitbus: BE3 § -ByteEnable3 — | ~enable D31:24 
BE2  -Byte Enable 2 —enable D23:16 
BE1 —Byte Enable1 —  ~enable D15:8 
BEO —Byte Enable 0 -—enable D7:0 


onfigured for a 16-bit data-bus width, the 
processor directly encodes Al EH 1 and BEO to provided BHE, A1 and BLE © 
| respectively. — 


For accesses to a memory regio 


16-bit bus: BE3 -Byte High Enable (BHE) —enable D15:8 
—  BE2 —Not used (is driven high or low) 
BE1 §—Address Bit 1 (A1) 


| Bye Low Enable (BLE) -enable D7:0 


For accesses toa memory region configured for an 8-bit data bus width, the 
processor directly erigodes BE1 and BEO to plovide Ai and AO ReSPoctvely: 


8-bit bus: BE3 --Not used (is driven high or low) 
| ~BE2 ~~ —Notused (is driven high or low) 
—Address Bit 1 (A1) 

—Address Bit 0 (AQ) - 


WRITE/READ is low (0) for read requests and high (1) for write requests. The 
W/R signal changes in the same clock cycle as ADS. It remains valid for the entire 
access in non-pipelined regions. In pipelined regions, W/R may not be valid in Oe 
last cycle of a read access. 


ADDRESS STROBE indicates valid address and the start of a new bus access. 
ADS is asserted for the first clock of a bus access. 


READY is an input which signals the termination of a data transfer. READY is 
used to indicate that read data on the bus is valid, or that a write-data transfer has 
completed. The READY signal works in conjunction with the internally 
programmed wait-state generator. If READY is enabled in a region, the pin is 
sampled after the programmed number of wait-states has expired. If the READY 
pin is deasserted high, wait states will continue to be inserted until READY 
becomes asserted low. This is true for the Nrap, Napp; Nwap, and Nwpp wait 
states. The Nypa wait states cannot be extended. 
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Table 2. 80960CA Pin eee een Bus Signals (Continued) 
Description 


BURST TERMINATE—The burst terminate signal breaks up a burst access and» 
causes another address cycle to occur. The BTERM signal works in conjunction 
with the internally programmed wait-state generator. If READY and BTERM are 
enabled in a region, the BTERM pin is sampled after the programmed number of 
wait states has expired. When BTERM is asserted, additional wait states are _ 
inserted until BTERM is deasserted. When BTERM is deasserted, anew ADS 
signal is generated and the access is completed. The READY input is ignored 
when BTERM is asserted. BTERM must be externally Benes to satisfy the 
BTERM setup and hold times. 


WAIT indicates the status of the internal wait state generator. WAIT is active 
when wait states are being caused by the internal wait state generator and not by 
the READY or BTERM inputs. WAIT can be used to derive a write-data strobe. © 

» WAIT can also be thought of as a READY output that the processor provides 
when it is inserting wait states.: 


BURST LAST indicates the last transfer in a bus access. BLAST is asserted in the 
last data transfer of burst and non-burst accesses after the wait state counter 
reaches zero. BLAST remains active until the clock following the last cycle of the 
last data transfer of a bus access. If the READY or BTERM input is used to extend 
wait states, the BLAST signal remains active until READY or BTERM terminates 
the access. 


DATA TRANSMIT/ RECEIVE indicates direction for data transceivers. DT/R is 
used in conjunction with DEN to provide control for data transceivers attached to 
the external bus. When DT/R is low (0), the signal indicates that the processor will 
receive data. Conversely, when high (1) the processor will send data. DT/R will 
change only while DEN is high. 


DATA ENABLE. indicates data cycles in a bus access. DEN is asserted (low) at 
the start of the first data ta cycle of a bus request and is deasserted (high) at the end 
of the last data cycle. DEN is used in conjunction with DT/ R to provide control for 
data transceivers attached to the external bus. DEN remains asserted for 
sequential reads from pipelined memory regions. DEN is high when DT/R 
changes. 


BUS LOCK indicates that an atomic read-modify-write operation is in progress. 
LOCK may be used to prevent external agents from accessing memory which is 
currently involved in an atomic operation. LOCK is asserted (0) in the first clock of 
an atomic operation, and deasserted in the clock cycle following the last bus 
access for the atomic operation. To allow the most flexibility for a memory system 
enforcement of locked accesses, the processor will acknowledge a bus hold | 
request when LOCK is asserted. The processor will perform DMA transfers while 
LOCK is active. | | 


HOLD REQUEST signals that an external agent requests access to the external 
bus. The processor asserts HOLDA after completing the current bus request. 
HOLD, HOLDA, and BREQ are used together to arbitrate access to the 
processor’s external bus by external bus agents. 


BOFF BUS BACKOFF —The backoff pin, when asserted (0), suspends the 
current access and causes the bus pins to float. When the pin is deasserted (1), 
the ADS signal is asserted on the next clock cycle and the access is resumed. ~ 
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Table 2. 80960CA Pin mbit elo tabeiabliols Bus ee (Continued) - 


HOLD ACKNOWLEDGE indicates to a bus requestor that the processor has 
relinquished control of the external bus. When HOLDA jis asserted, the external 
address bus, data bus, and bus control signals are floated. HOLD, BOFF, HOLDA 
and BREQ are used together to arbitrate access to the processor’s external bus 
by external bus agents. Since the processor will grant HOLD requests and enter 
the Hold Acknowledge state even while RESET is active, the state of the HOLDA 
pin will be independent of the RESET pin. 


BUS REQUEST indicates that the processor wishes to perform a bus request. 
BREQ can be used by external bus arbitration logic in conjunction with HOLD and 
HOLDA to determine when to return mastership.of the external bus to the 
processor. 


DATA OR CODE indicates that a bus request is a data request (1) or a instruction 
request (0). D/C has the same timing as W/R 


DMA ACCESS indicates whether the bus request was initiated ed by the DMA 
controller. DMA will be asserted (low) for any DMA moates DMA will be 
deasserted (high) for all other requests. 


SUPERVISOR ACCESS indicates whether the bus request is issued while in 
supervisor mode. SUP will be asserted (low) when the request has supervisor 
privileges, and will be deasserted (high) otherwise. SUP can be used to isolate 
supervisor code and data structures from non-supervisor requests. 


Table 3. 80960CA Pin Description—Processor Control Signals 
Description 


RESET causes the chip to reset. When RESET is asserted (low), all external signals 
return to the reset state. When RESET is deasserted, initialization begins. When the 
two-x clock mode is selected, RESET must remain asserted for 16 PCLK2:1 cycles 
before being deasserted in order to guarantee correct initialization of the processor. 
When the one-x clock mode is selected, RESET must remain asserted for 10,000 
PCLK2:1 cycles before being deasserted in order to guarantee correct initialization of 
the processor. The CLKMODE pin selects one-x or two-x input clock division of the 
CLKIN pin. 
The processor’ s Hold Acknowledge bus state functions while the chip is reset. If the 
processor’s bus is in the Hold Acknowledge state when RESET is activated, the 
processor will internally reset, but will maintain the Hold Acknowledge state on 
external pins until the Hold request is removed. If a hold request is made while the 
processor is in the reset state, the processor bus will arent HOLDA and enter the Hold 
Acknowledge state. , 


FAIL indicates failure of the processor’s self-test performed at initialization. When 

~RESET is deasserted and the processor begins initialization, the FAIL pin is asserted | 
(0). An internal self-test is performed as part of the initialization process. If this self-test 
passes, the FAIL pin is deasserted (1) otherwise it remains asserted. The FAIL pin is 
reasserted while the processor performs and external bus self-confidence test. If this 
self-test passes, the processor deasserts the FAIL pin and branches to the users 
initialization routine, otherwise the FAIL pin remains asserted. Internal self-test and the 
use of the FAIL pin can be disabled with the STEST pin. 
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Table 3. 80960CA Pin Description—Processor Control Signals (Continued) 


| CLKMODE 


SELF TEST causes the processor’s internal self-test feature to be enabled or 
disabled at initialization. STEST is read on the rising edge of RESET. When asserted 
(high) the processor’s internal self-test and external bus confidence tests are 
performed during processor initialization. When deasserted (low), only the internal 


- self-test is not performed during initialization. 


ON CIRCUIT EMULATION causes all outputs to be floated when asserted (low). 


ONCE is continuously sampled while RESET is low, and is latched on the rising edge 


of RESET. To place the processor in the ONCE state: 
_ assert RESET and ONCE (order does not matter) 
wait for at least 16 CLKIN periods in two-x mode, or 10,000 CLKIN periods in 
one-x mode, after Vcc and CLKIN are within operating specifications 
deassert RESET 
wait at least 32 CLKIN periods 

(The processor will now be latched | in the ONCE state as long as RESET i is high.) 


To exit the ONCE state, bring Vcc and CLKIN to operating conditions, then assert 
RESET. and bring ONCE high prior to Geasserting RESET. 


CLKIN must operate within the specified operating conditions of the processor until 
step 4 above has been completed. The CLKIN may then be changed to DC to 


achieve the lowest possible ONCE mode leakage current. 


ONCE can be used by emulator products or for board testers to effectively make an 
installed processor transparent in the board. 


CLOCK INPUT is an input for the external clock needed to run the processor. The 
external clock is internally divided as s prescribed by the CLKMODE pin to produce 


‘PCLK2:1. 


CLOCK MODE selects the division factor applied to the external clock input (CLKIN). 
When CLKMODE is high (1), CLKIN is divided by one to create PCLK2:1 and the 

processor's internal clock. When CLKMODE is low (0), CLKIN is divided by two to | 
create PCLK2:1 and the processor’s internal clock. CLKMODE-.should be tied high, or . 
low in a system, as the clock mode is not latched by the processor. Ifleft 
unconnected, the processor will internally pull the CLKMODE pin low (0), enabling the . 
two-x clock mode. 


PROCESSOR OUTPUT CLOCKS provide a timing reference for all inputs and 
outputs of the processor. All inputs and output timings are specified in relation to 
PCLK2 and PCLK1. PCLK2 and PCLK1 are identical signals. Two output pins are 


oT. provided to allow flexibility in the system’s allocation of capacitive loading on the 


clock. PCLK2:1 may also be connected at the processor to form a single clock signal. 


GROUND connections consist of 24 pins which must be connected externally to a 
Vss board plane. | 


POWER connections consist of = PINs whieh must ae connected externally toaVcc 
board plane. 7 


NO CONNECT pins must not be connected in a system. 
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Table 4. 80960CA Pin Description—DMA and Interrupt Unit Control Signals 


A(E/L) 
H(Z) 
R(Z) 


DMA REQUEST causes a DMA transfer to be requested. Each of the four signals 


| requests a transfer on a single channel. DREQO requests channel 0, DREQ1 


requests channel 1, etc. When two or more channels are requested simultaneously, | 
the channel with the highest priority is serviced first. The channel priority mode is 
programmable. 


DMA ACKNOWLEDGE indicates that a DMA transfer is being executed. Each of the 
four signals acknowledges a transfer for a single channel. DACKO acknowledges 
channel 0, DACK1 acknowledges channel 1, etc. DACK3:0 are active (0) when the 
requesting device of a DMA is accessed. 


END OF PROCESS/TERMINAL COUNT can be programmed as either an input 
(EOP3:0) or as an output (TC3:0), but not both. Each pin is individually 
programmable. When programmed as an input, EOPx causes the termination of a 
current DMA transfer for the channel corresponding to the EOPx pin. EOPO . 
corresponds to channel 0, EOP1 corresponds to channel 1, etc. When a channel is 
configured for source and destination chaining, the EOP pin for that channel causes 
termination of only the current buffer transferred and causes s the next buffer to be: 
transferred. EOP3:0 are asynchronous inputs. 


‘When programmed as an output, the channel’s TCx pin int indicates that the channel 


byte count has reached 0 and a DMA has terminated. TCx is driven with the same 
timing as DACKx during the last DMA transfer for a buffer. If the last bus request is 
executed as multiple bus accesses, TCx will stay asserted for the entire bus request. 


EXTERNAL INTERRUPT PINS cause interrupts to be requested. These pins can be 


configured in three modes. 

In the Dedicated Mode, each pin is a dedicated external interrupt source. Dedicated 
inputs can be individually programmed to be level (low) or edge (falling) activated. 

In the Expanded Mode, the 8 pins act together as an 8-bit vectored interrupt source. 
The interrupt pins in this mode are level activated. Since the interrupt pins are active 
low, the vector number requested is the one’s complement of the positive logic value 
place on the port. This eliminates glue logic to interface to combinational priority 
encoders which output negative logic. 

In the Mixed Mode, XINT7:5 are dedicated sources and XINT4:0 act as the 5 most. 
significant bits of an expanded mode vector. The least Somucen bits are set to 010 
internally. 


NON-MASKABLE INTERRUPT causes a non-maskable interrupt event to occur. 
NMI is the highest priority interrupt recognized. NMI is an edge (falling) activated 
source. 
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3.3. 80960CA Pinout = = © =: ~~ 80960CA. pinout as viewed from: the top side of the 

: pees en component (i.e., pins facing down). Figure 4b shows 

the complete 80960CA pinout as viewed from the ~ 

_ pin-side of the package (i.e., pins facing up). See 

Section 4.0, Electrical Specifications for ne 
tions and vecormmended connections. © 


3.3.1 80960CA CPGA PINOUT 


Tables 5 and 6 list the 80960CA : pin’ names with 
package location. Figure - a 1 depicts the complete 


Table S. PGA Pin Name with Package Location (Signal Order) 


_ Address Bus” ‘Bus Control Processor Control. A a 
‘Name - Location |"Name_... Location 


RO3 | BES 


a ME 
Vss — Boers ae 
eee ne: 


_ Location | 


Cov, C08, CO9, 
C10, C11, C12, . 


F15, G03, G15, 
HO3, H15, J03,. 
J15,KO3, K15, 
‘LO3,L15,M03, 
M15, Q07, Q08, 
-ROS | Q09,Q10,Q11.. | 
S04 en eC 
R13 XINTO 
B07, B09, B10, | 
B11, B12, C06, 
E15, F03, F16, 
G02, H16, JO2, 
J16, KO2, K16, M02, 
M16, NO3, N15, 
Q06, RO7, R08, 
R10, R11 


Co1 | BOFF......B01 | | NoConnect [| 
boo | Location _ 


A01, A03, A04, AO5, 
Bos, B04, C04, C05, 
DO3 
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Table 6. PGA Pin Name with Package Location (Pin Order) 
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Figure 4a. 80960CA PGA Pinout (View from Top Side) 
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Figure 4b. 80960CA PGA Pinout (View from Bottom Side) 
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3.3.2 80960CA PQFP Pinout See Section 4.0, Electrical Specifications for 
7 specifications and recommended connections. 

Tables 7 and 8 list the 80960CA pin names with am * 

package location. 


Table 7. PQFP Pin Name with Package Location (Pin Order) 


Address Bus Bus Control Processor Control 
Name ........Location 


2,7, 16, 24, 30, 38, 
39, 49, 56, 70, 75, 

77, 81, 83, 88, 89, 
92, 98, 105, 109, 110, 
121, 125, 131, 135, 
147, 150, 161, 165, 
173, 174, 185, 196 


1, 12, 20, 28, 

32, 37, 44, 50, 
61,71, 72, 79, 

82, 96, 99, 103, 
115, 127, 140, 148, 
154, 168, 171, 180, 
190. 


29, 41, 42, 47, 
48, 51, 52,53, - 
54, 55, 73, 76, 
80, 84, 86, 90, 97, 
104, 126, 146, 149, 157, 
166, 177, 183, 193 
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Table 8. PQFP Pin Name with Package Location (Pin Order) 
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Figure 4c. 80960CA PQFP Pinout (View from Top Side) | 
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3.4. Mechanical Data 
3.4.1 CERAMIC PGA PACKAGE 
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Figure 5. 168-Lead Ceramic PGA Package Dimensions 
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Table 9. Ceramic PGA Package Dimension Symbols 


Description of Dimensions 


[8d Diameter ofterminaeedpin 
ee 
mane eee 


._ Letter or 
Symbol | 


NOTES: | 
* 1. Controlling dimension: millimeter. 
~ 2. Dimension ‘“e,” (“e’”) is non-cumulative. 
3. Seating plane (standoff) is defined by P.C. board hole size: 0.0415-0.0430 inch. 
4. Dimensions ‘“B”, “B,” and “C” are nominal. 
5. Details of Pin 1 identifier are optional. 
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3.4.2 PLASTIC QUAD FLAT PACKAGE 
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Figure 6. Principal Dimensions and Datums 
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Figure 7. Molded Details 


1.32 (.952) 
1.22 (.948) 


6.98 (.8355) MIN. 


1.52 (¢.852) 


1.22 (.848) 7 
8.98 (.835) H. LL ge cen 


1.93 (.876&) 
2.85 (.882) 


1.95 ¢€.8768) 


270727-58 
mm finch) 


Figure 8. Detail M 


3-187 


intel. | 80960CA-33, -25, -16 


| 8.635 (8.825) 


61) eee IE a 


CO 


SEE DETAIL L 
SEE DETAIL J 


. 270727-56 
mm (inch) 


| [8.13 ¢.085)@[C[A@-8© [DOA 


B.41 (.816) 
| 8.28 (.808) 
4A (2 


O.31 (.812) | 
@.28 ¢.888) 


[O [8.28 (008) @[c[A@-BO OLA 
| 270727-57 
Detail J ' Detail L 


mm (inch) 
_ Figure 10. Typical Lead | 


Table 10. PQFP Package Dimension Symbols 


Symbol Description | | Max | 
PN Leadcount 196 


; 


| 196 
Package Height | 0.160 | 0.170 | 4.06 | 4.32 
De Terminal Dimension 37.72 
Package Body “= - 34.37 
Bumper Distance | 38.18 
Foot Radius Location | 1.423 | 1.437 | 36.14 | 36.49 


Dimension INCH 


NOTES: . 

1. All dimensions and tolerances conform to ANSI Y14.5M-1982. 
2. Datum plane -H- located at the mold parting line and coincident with the bottom of the.lead where lead exits plastic body. 
3. Datums A-B and -D- to be determined where center leads exit plastic body at datum plane -H-. 

4. Controlling Dimension, Inch. ae ; 

5. Dimensions D1, D2, E1 and E2 are measured at the mold parting line. D1 and E1 do not include an allowable mold 
protrusion of 0.18 mm (0.007 in) per side. D2 and E2 do not include a total allowable mold protrusion of 0.18 mm (0.007 in) 
at maximum package size. 7 
6. Pin 1 identifier is located within one of the two zones indicated. 

7. Measured at datum plane -H-. ee ae 

8. Measured at seating plane datum -C-. 
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3.5. Package Thermal Specifications 


The 80960CA is specified for operation when Tc 
(the case temperature) is within the range of 0°C- 
100°C. Tc may be measured in any environment to 
determine whether the 80960CA is within specified 
operating range. The case temperature should be 
measured at the center of the top surface, opposite 
the pins. Refer to Figure 13. 


Ta (the ambient temperature) can be calculated 
from Oca (thermal resistance from case to ambient) 
with the following equation: 


Ta = Tc — P*8ca 


80960CA-33, -25, -16 


Table 11 shows the maximum Ta allowable (without 
exceeding Tc) at various airflows and operating oe 


quencies (fpc_k). 


Note that T, is greatly improved by attaching fins or 
a heat sink to the package. P (the maximum power 
consumption) is calculated by using the typical Icc 
as tabulated in Section 4.4, DC Characteristics, 
and Vcc of 5V. 


Table 11. Maximum Ta, at Various Airflows In °C (PGA Package Only) 


Airflow-ft/min (m/sec) 


feck 200 400 600 800 1000 
(ti 0 (1 ca 2s 03) (4 04 (4 06) (6s o7) 


* 0.285” high unidirectional heat sink (Al alloy 6061, 50 mil fin width, 150 mil center-to-center fin spacing). 


PGA Thermal ResistanceC/ Watt 


Airflow—ft./min (m/sec) 
Parameter 400 600 800 1000 
(2.03) | (3.07) | (4.06) | (5.07) 
68 Junction-to-Case 
(Case Measured 1.5|{ 1.5 15 1.5 1.5 
as shown in Figure 13) 
6 Case-to-Ambient 
(No Heatsink) CRE EEESEIC? 
| | | ele] 


6 Case-to-Ambient 
(with Unidirectional) 
‘Heatsink) * 


Dee 


NOTES: 


270727-59 | 


1. This table applies to 80960CA PGA plugged into socket or soldered directly | 


into board. 

2. On = Ac + Aa. 

3. 8y.cap = 4°C/W (approx.) 
05. a 4°C/W (inner pins) (approx.) 
Oy. 8°C/W (outer pins) (approx.) 


* 0, 385” high unidirectional heat sink (Al alloy 6061, 50:mil fin width, 150 mil 


center-to-center fin spacing). 


Figure 11. 80960CA PGA Package Thermal Characteristics 
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PQFP Thermal Resistance—C/Watt 


Airflow—ft./min (m/sec) 
| Parameter. 100 | 200 | 400 | 600 |. 800 | 
“ies ae ie cnt (0.50) ye 01) | (2.03) | (3.04) | (4.06) 
|@ Junction-to-Case | 
(Case Measured) 5 5 5 
6 Case-to-Ambient 
(No Heatsink) eters 


as shown in Figure 13) 
NOTES: 


1. This table applies to 80960CA us soldered aikecty. into board. 

2. Oya = O5c +. Oca. | 
3. Oy, = 18°C/Watt 
6jB = 18°C/Watt 


Figure 12. 80960CA PQFP Package Thermal Characteristics 


270727-60 


MEASURE PGA CASE TEMPERATURE , MEASURE PQFP TEMPERATURE AT 
AT CENTER OF TOP SURFACE - CENTER OF TOP SURFACE © 


168—PIN PGA 
Pin 196 


270727-61 


- 270727-62. 


Figure 13. Measuring 80960CA PGA and PQFP Case Temperature 
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3.6 Stepping Register Information 


Upon Reset, Register GO contains die stepping in- 
formation. The following figure shows how GO is 
configured. The most significant byte contains an 
ASCII 0. The upper middle byte contains an ASCII C. 
The lower middie byte contains an ASCII A. The 
least significant byte contains the stepping number 
in ASCII. GO retains this information until it is written 
over by the user program. 


Table 12 contains a cross reference of the number 


in the least significant byte of register GO to the die 
stepping number. 


asc | oo | 43 | 41 _| Stepping Number 


pecwat [0 | ¢ | A_| Stepping Number 


MSB LSB 


Figure 14. Register GO 


Table 12. Die Stepping Cross Reference © 


oe ene Die Steppin 
— Byte Pping 
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3.7 Suggested Sources for 80960CA 
Accessories 


The following are some suggested ‘sources of ac- 
cessories for the 80960CA. They are not an en- 
dorsement of any kind, nor a warranty of the per- 
formance of any of the listed products and/or com- 
panies. 


Sockets 


1. 3M Textool Test and Interconnection Products 
Department 
P.O. Box 2963 
Austin, TX 78769-2963 


2. Augat, Inc. 
Interconnection Products Group 
33 Perry Avenue 
P.O. Box 779 
Attleboro, MA 02703 | 
(508) 222-2202 


3. Concept Manufacturing Inc. 


(Decoupling Sockets) - . 
43024 Christy Street 
Fremont, CA 94538 
(415) 651-3804 


Heat Sinks/ Fins 


1. Thermalloy, Inc. 
2021 West Valley View Lane | 
Dallas, TX 75381-0839 
(214) 243-4321 


2. EG & G Division 
60 Audubon Road 
Wakefield, MA 01880 
(617) 245-5900 
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4.0 ELECTRICAL SPECIFICATIONS 


* WARNING: Ghessing the device beyond the “Absolute 
Maximum Ratings” may cause permanent damage. 
These are stress ratings only. Operation beyond the 
“Operating Conditions” is not recommended and ex- 
tended exposure beyond the “Operating Conditions” | 
may affect device reliability. , 


4.1 Absolute Maximum Ratings 


Maximum Rating 


‘Storage Temperature 


~65°C to + 150°C 

~65°Cto +110°C. 
—0.5V to +6.5V 

—0.5V to Voc. +.0.5V 


Case Temperature Under Bias 
Supply Voltage wrt. Vss 
Voltage on Other pins wrt Vss 


4.2. Operating Conditions 
Operating! Conditions (80960CA-33, -25, -16) 


| Senbot_ | _Perseete ——._- Sn_| Mex} vote | Nee 


Supply ede 80960CA-33 | 
80960CA-25 
80960CA-16 


80960CA-33 
80960CA-25 
80960CA-16 
80960CA-33 


80960CA-25 
80960CA-16 


PGA Package > 
196-Pin PQFP | 


Input Clock Frequency (2-x Mode). 


Input Clock Frequency (1-x Mode) 


Case Temperature Under Bias 
80960CA-33, -25, -16 


NOTE: | . 
(1) When in the 1-x input clock’ mode, CLKIN is an input to an internal phase-locked loop and must maintain a minimum 
frequency of 8 MHz for proper processor operation. However, in the 1-x Mode, CLKIN may still be stopped when the 


processor either is in a reset condition or is reset. If CLKIN is stopped, the specified RESET low time must be provided once 


CLKIN restarts and has stabilized. 


4.3 Recommended Connections 


Power and ground connections must be made to 
multiple Vcc and Vss (GND) pins. Every 80960CA- 
based circuit board should include power (Vcc) and 
ground (Vss) planes for power distribution. Every 
Vcc pin must be connected to the power plane, and 
every Vss pin must be connected to the ground 
plane. Pins identified as “N.C.” must not be con- 
nected in the system. 


Liberal decoupling capacitance should be placed 
near the 80960CA. The processor can cause tran- 
sient power surges when its numerous output buff- 
ers transition, particularly when connected to large 
capacitive loads. 


Low inductance capacitors and interconnects are 
recommended for best high frequency electrical per- 
formance. Inductance can be reduced by shortening 
the board traces between the processor and decou- 
pling capacitors as much as possible. Capacitors 
specifically designed for PGA packages will offer the 
lowest possible inductance. 


For reliable operation, always connect unused in- 
puts to an appropriate signal level. In particular, any 
unused interrupt (XINT, NMI) or DMA (DREQ) input 
should be connected to Vcc through a pull-up resis- 
tor, as should BTERM if not used. Pull-up resistors 
should be in the range of 20 KQ for each pin tied 
high. If READY or HOLD are not used, the unused 
input should be connected to ground. N.C. pins 


' must always remain unconnected. Refer to the 


80960CA User’s Manual for more information. 
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4.4. DC Specifications 


DC Characteristics 
(80960CA-33, -25, -16 under the conditions described in Section 4.2, Operatina Conditions.) | 


Pv | Input Low Voltage for all pins except RESET 
V 
V 


Input High Voltage for all pins except RESET 

I WOE: | Output Low Voltage 7 fd 
Output High Voltage lon = —1mA 2.4 

: lon = —200npA- | Vcc — 0.5 

3.5 


READY, HOLD, BOFF, CLKMODE 


Input Leakage Current for: 
BTERM, ONCE, DREQ3:0, STEST, 


Input Low Voltage for RESET — 0.3 


Input High Voltage for RESET / 35 | 
ILi2 
EOP3:0/TC3:0, NMI, XINT7:0, BOFF (2) 
3) 


VIL 
lH 
OL 
OH 
li Input Leakage Current for each pin except: 
BTERM, ONCE, DREQ3:0, STEST, 
| EOP3:0/TC3:0, NMI, XINT7:0, 
ILi3 Input Leakage Current for: a, 
| READY, HOLD, CLKMODE 5 : \ | Vin = 2.4V ( 
Output Leakage Current ack a a | pA 0.45V<VoutsVccl 
Icc | Supply Current (80960CA-33) _ , Ze 4 
Icc Max 900 mA | 
loc Typ er 750 
loc Supply Current (80960CA-25) : 
Icc Max | | 
Icc Typ | | | 
loc Supply Current (B0960CA-16) | 
lcc Max : 550 mA 
loc Typ 400 : 
) x 


lonce | ONCE-mode Supply Current | Ll mA 


C , 
XINT7:0, NMI, BTERM, CLKMODE 1 pF Fo = 1 MHz 


| 2 | | 
Output Capacitance of each output pin i ae ore Fo = 1 MHz, (6) | 


NOTES: 

(1) No Pull-up or pull-down. 

(2) These pins have internal pullup resistors. 

(3) These pins have internal pulldown resistors. . . 

(4) Measured at worst case frequency, Vcc and temperature, with device operating and outputs loaded to the test conditions 
described in Section 4.5.1, AC Test Conditions. . es . 

(5) lec Typical is not tested. 

(6) Output Capacitance is the capacitive load of a floating output. 

(7) CLKMODE pin has a pull down resistor only when ONCE pin is deasserted. 


Input Capacitance for: 
CLKIN, RESET, ONCE, 
READY, HOLD, DREQ3:0, BOFF 
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4.5 AC Specification 


AC Characteristics — 80960CA-33 
(80960CA-33 only, under the conditions described in Section 4.2, Operating Conditions and Section 4.5. 1, 
AC Test Conditions.). . 


[symbot [Parameter | in [Wax | Unite [Notes — 


INPUT CLOCK(10) 


ce 


CLKIN Period . . In One-X.Mode (fcLikix) 30.3 eas (, 12) 
! In Two-X Mode (fco_K2x) ae 15. (1) | 
= CLKIN Period Stability In One-X Mode (fork) | | 0.1% =| A | 413) | 


CLKIN High Time _ In One-X Mode (fcik1) 
In Two-X Mode (fcoiK2x) 


CLKIN Low Time In One-X Mode (fotki x) 
In Two-X Mode (foL_K2x) 


CLKIN Rise Time 
CLKIN Fall Time. 


OUTPUT CLOCKS(?) 
CLKIN to PCLK2:1 sae In One-X Mode (foiKk1x) v 3,13,14) 
3 In Two-X Mode (fo Ke x) (1,3) 
PCLK2:1.Period =~: InOne-X Mode (fotk1x): cae | (1,13) 
| | In Two-X Mode (ich ax) 2Tc | (1,3) 
| PCLK2:1 High Time : m/a)-2 | 2 | ns | (4,13) 
PCLK2:1 Low Time (1/2) — 2 we | ne | (1,13) 


ir ee Ee joins | 3) 
[Puke Fat Te 


SYNCHRONOUS OUTPUTS(10) _ 


Output Valid iad Output Hold 

Tovi; Tout 

Tove; Tou2 

Tova; Tous 

Tova, Tou 

Tovs, Tous 

Tove: TOH6 

Tov7, TOH7 | 

Tove, Tous 

Tova: Tous 

Tovio, TOH10 DACK3:0, EOP3: 0/TC3:0 
Tovi1, Tou11 D31:0 
Tovi12; Tou12 

Tovi3: ToH13 


OaAAOAA W OD WD W 


Input Setup 

Tist 

Tis2 | BOFF 

Tisg BTERM/READY 

Tisa eo ee HOLD 

Input Hold wy | | Lo acted 
Tiwi : : D31:0 |. : A 
Tipo . ae _ BOFF | . Fae tes (1,11) 
TiH3 _ BTERM/READY (1,11) 
Tin4 HOLD (1311) 
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AC Characteristics — 80960CA-33 
80960CA-33 only, under the conditions described in Section 4.2, Operating Conditions and Section 4.5.1, 
AC Test Conditions.) (Continued) : 


Symbol | Parameter | Min’ =| Max | Units | Notes | 
RELATIVE OUTPUT TIMINGS(9,7) | 
A312 Valid to ADS Rising ae ee ee ce ee ee 


TAVSH2 BE3:0, W/R, SUP, D/C, 
DMA, DACK3:0 Valid to ADS Rising T-6 T+ 6 


A312 Valid to DEN Falling ee ee ee eee 


BES:0, W/R, SUP, INST, 


DMA, DACKS:0 Valid to DEN Falling 


a 
Nepete4 


RESET Input Setup f BY 
RESET Input Hold 
DREQ3:0 Input Setup 

: 


DREQ3:0 Input Hold 


Tis7 _XINT7:0, NMI Input Setup 
XINT7:0, NMI Input Hold 


NOTES: 

(1) See Section 4.5.2, AC Timing Waveforms for waveforms and definitions. 

(2) See Figure 22 for capacitive derating information for output delays and hold times. 

(3) See Figure 23 for capacitive derating information for rise, and fall times. 
(4) Where N is the number of Nrap, Nraop, Nwap, or Nwop wait states that are programmed in the Bus Controller Region 
Table. When there are no wait states in an access, WAIT never goes active. . 

(5) N = Number of wait states inserted with READY. _ a 

(6) Output Data and/or DT/R may be driven indefinitely following a cycle if there is no subsequent bus activity. 

(7) See Notes 1, 2 and 3. . a i 

(8) Since asynchronous inputs are synchronized internally by the 80960CA they have no required setup or hold times in 
order to be recognized and for proper operation. However, in order to guarantee recognition of the input at a particular rising 
edge of PCLK2:1 the setup times shown must be met. Asynchronous inputs must be active for at least two consecutive 
PCLK2:1 rising edges to be seen by the processor. 

(9) These specifications are guaranteed by the processor. 

(10) These specifications must be met by the system for proper operation of the processor. . 

(11) This timing is dependent upon the loading of PCLK2:1. Use the derating curves of Section 4.5.3 to adjust the timing for 
PCLK2:1 loading. | | 
(12) In the One-x input clock mode the maximum input.clock period is limited to 125 ns while the processor is operating. 
When the processor is in reset, the input clock may stop even in One-x mode. Ose Be 
(13) When in the One-x input clock mode, these specifications assume a stable input clock with a period variation of less 
than +0.1% between adjacent cycles. iat & 

(14) This parameter is not tested. : 
(15) Since asynchronous inputs are synchronized internally by the 80960CA, they have no required setup or hold times in 
order to be recognized and for proper operation. However, in order to guarantee recognition of the input at a particular 
falling edge of PCLK2:1 the setup times shown must be met. Asynchronous inputs must be active for at least two consecu- 
tive PCLK2:1 falling edges to be seen by the processor. as 
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| AC Characteristics — 80960CA-25 
(80960CA-25 only, under the conditions described in Section 4.2, Operating conatons and Section 4.5. 1, 
AC Test Conditions.) 


[symbot [| ____Parameter——~—S*d' Sin | Wax | Unite | Notes _| 
ee 


INPUT CLOCK(10) 


te CLKIN Frequency a ee eS ee 

EIN Period In One-X Mode (fcoik1x) 40 ie (1, = 

In Two-X Mode (fcLKe2x) 20 (1) 

a0 CLKIN Period Stability In One-X Mode (fait) a (1,13) 
ett High Time In One-X Mode (fcoik1x) 8 ee . (1,12): |. 

In Two-X Mode (fcLk2x) 8 | Hie (1) 

L. CLKIN Low Time In One-X Mode (fcoik1x) 8 e 5 (1,12) 

| In Uda X Mode (fcikex) 8 (1) 
ke CLKIN Rise Time | es or ee 
ie CLKIN Fall Time goo ee 


OUTPUT CLOCKS(®)_ 
In One-X Mode (foi Kix) (1,3,13,14) 
In Two-X Mode (foLike2x) (1,3) 
~ In One-X Mode (foLkix) |. 7 (1,13) 
In Two-X Mode (icuex) | a (1,3) 
T PCLK2:1 High Time. (1/2) — 3 a al ae (1.13) 
_ PCLK2:1 Low Time | | aray-3 2. | et (1,13) 


~ SYNCHRONOUS aneian 


Output Valid ser Output Hold 
Tovi; Tout 
Tove, Tou2 
Tova: Tous 
Tova: ToH4 
Tovs; ToHs 
Tove: Tous 


CLKIN to PCLK2:1 Delay 


PCLK2:1 Period 


BLAST, WAIT. 


Tov7; TOH7 
Tove; ToHs 
Tova: ToHa 


| DEN 
HOLDA, BREQ 
- LOCK 


Tovi0; TOH10 DACK3:0 0, EOP3:0/TC3:0 
Tovit» TOH11 D31:0 
Tovi2, ToH12 DT/R 


ORRAWHAWHDWW 


—Tovi3: ToH13 FAIL 


| Input Setup aan . 

—7Ths1 p31 0 
Tis2_ , | BOFF 
Tisg | ae BTERM/READY 
isa | | a HOLD 


Input Hold a 
TiH1 D31:0 
TiH2 BOFF 
TiH3 ; BTERM/READY 
Tia HOLD 
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AC Characteristics — 80960CA-25 | 
(80960CA-25 only, under the conditions described in Section 4.2, Operating Conditions and Section 4.5.1, 
AC Test Conditions.) (Continued) 


| Symbol | Parameter | in’ =| Max | Units | Notes 


RELATIVE OUTPUT TIMINGS(9,7) 


TavsHi | A31:2 Valid to ADS Rising T+4 [ns |. 


TAVSH2 BE3:0, W/R, SUP, D/C, 
DMA, DACK3:0 Valid to ADS Rising T+ 6 ns 
TAVERI: ASU 2 Vale DEN baling a ae es ee ee 


TAVEL2 BE3:0, W/R, SUP, INST, 
DMA, DACK3:0 Valid to DEN Falling 


Two | Ou DaaHodaterwarTAog | WeoT—« [Wen Tee | me | 6 
| on | @ | 


Trve | DT/R Valid to DEN Falling 
| ‘RELATIVE INPUT TIMINGS() | 

| Ts | RESETInputHoid | ts) 

[Tiss | OREGED nputseup das 

| Tie | DREGSOImputHold | is |) 


XINT7:0, NMI Input Setup a ee ee 
| Tia _|_XINT7:0, NI input Hold a: ee Sees i 


NOTES: 

(1) See Section 4.5.2, AC Timing Waveforms for waveforms and definitions. 

(2) See Figure 22 for capacitive derating information for output delays and hold times. 

(3) See Figure 23 for capacitive derating information for rise and fall times. 

(4) Where N is the number of Nrap, Nrapp, Nwap, or Nwpp wait states that are programmed in the Bus Controller Region 
Table. When there are no wait states in an access, WAIT never goes active. 

(5) N = Number of wait states inserted with READY. 

(6) Output Data and/or DT/R may be driven indefinitely following a cycle if there is no subsequent bus activity. 

(7) See Notes 1, 2 and 3. . 

(8) Since asynchronous inputs are synchronized internally by the 80960CA they have no required setup or hold times in 
order to be recognized and for proper operation. However, in order to guarantee recognition of the input at a particular rising 
edge of PCLK2:1 the setup times shown must be met. Asynchronous inputs must be active for at least two consecutive 
PCLK2:1 rising edges to be seen by the processor. . ae 

(9) These specifications are guaranteed by the processor. . 

(10) These specifications must be met by the system for proper operation of the processor. —_ 

(11) This timing is dependent upon the loading of PCLK2:1. Use the derating curves of Section 4.5.3 to adjust the timing for 
PCLK2:1 loading. — ~~ . 

(12) In the One-x input clock mode the maximum input clock period is limited to 125 ns while the processor is operating. 
When the processor is in reset, the input clock may stop even in One-x mode. 

(13) When in the One-x input clock mode, these specifications assume a stable input clock with a period variation of less - 
than +0.1% between adjacent cycles. | | 

(14) This parameter is not tested. 

(15) Since asynchronous inputs are synchronized internally by the 80960CA, they have no required setup or hold times in 
‘order to be recognized and for proper operation. However, in order to guarantee recognition of the input at a particular 
falling edge of PCLK2:1 the setup times shown must be met. Asynchronous inputs must be active for at least two consecu- 
tive PCLK2:1 falling edges to be seen by the processor. a | 
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| AC Characteristics — 80960CA-16 
(80960CA-16 only, under the conditions described in Section 4.2, Operating Conditions and Section 4.5.1, 
AC Test Conditions.) (Continued) 


[sympot [ameter | in [eT unite | notes 


INPUT CLOCK(10) 
CLKIN Frequency a ee 
In One-X. Mode (foiK1) 62.5 ye (1,12) 
In Two-X Mode (fc. Ko») Ea 25 i 


CLKIN Period 

| Tog | CLKIN Period Stability In One-XMode (foci) | «| 0.1% | (4,13) | 
In One-X Mode (fcoik1x) 10 ee 5 (1,12) 
In Two-X Mode (foLk2x) 10 (1) 


TCH CLKIN High Time 
To CLKIN Low Time In One-X Mode (foik1,) 10 se (1, 12) 

In Two-X Mode (fcLk2x) |’ «10 (1) 

| Tor | CLKIN Rise Time ieee Ca Soe (1) 
be CLKIN Fall Time eae ae ee (1) 


OUTPUT CLOCKS(9) 


CLKIN to PCLK2:1 Delay In One-X Mode (fotki) 
In Two-X Mode (foci Key) 


(1,3,13,14) 
(1,3) 


(1,13) 
(1,3) 


(1,13) 
(1,13) 
(1,3) 
(1,3) 


In One-X Mode (fork) 
In Two-X Mode (fcLkex) 


n 


-| Output Valid Delay, Output Hold 
~Tovi; Tout A31:2 
Tova; ToH2 - | BE3:0 
Tovs; ToH3 AD 

Tova: ToH4 . 3 | | W/R 
Tovs: TOHS | 


Tove: ToHe | "BLAST. WAIT 
Tov7; TOH7 | | DEN 
Tove; Tous HOLDA, BREQ 
Tova: Tous | LOCK 

- Tovio: ToHi0 - DACK3:0 0, EOPS: 0/TC3:0 
Tovit, TOH11 D31:0 
Tovi2: ToH12 DT/R 
Tovi3, Tou13 FAIL 


| Tor | Output Float for all outputs see 


A SA alah INPUTS(10) 


Tis Input Setup 7 . | 7 
Tis — D31:0 5 
Tis2 BOFF 2 | 
Tisa : 2. _ BTERM/READY 9 
| Tisa | HOLD 9 
TH Input Hold 


TiH4 | D31:0 
TiH2 : BOFF 
TiH3 BTERM/READY 


TiH4 | 7 HOLD 
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5 8 AC Characteristics — 80960CA-16 
(80960CA-16 only, under the conditions described in Section 4.2, Operating Conditions and Section 4.5.1, 
AC Test Conditions.) (Continued) . 


[symbo [Parameter «(in «Sax | Unite | Notes 


RELATIVE OUTPUT TIMINGS(9,7) 


91:2 Valid to ADS Rising a ae a ae 


TavsH2 | BE3:0,W/R,SUP,D/C, _ 
DMA, DACK3:0 Valid to ADS Rising T-6 T+ 6 ns 


TAVEL2 BE3:0, W/R, SUP, INST, 


+ 
| 
Oo 
+ 
(o>) 


DMA, DACK3:0 Valid to DEN Falling 
WAIT Falling to Output Data Valid 


Output Data Valid to WAIT Rising a 
WATT Falling to WATT Rising ns 


+ 


+4 


(N+1)*T—-4] (N+ 1)*T+41] = ns 


Output Data Hold after WAIT Rising 


DT/R Hold after DEN High 2-4 | «© | 
DT/R Valid to DEN Falling Tj2a-4 | T/atra 


RELATIVE INPUT TIMINGS(?) 
RESET Input Setup 


RESET Input Hold 
DREQ3:0 Input Setup | 
DREQ3:0 Input Hold 
XINT7:0, NMI Input Setup 
XINT7:0, NMI Input Hold 


NOTES: 

(1) See Section 4.5.2, AC Timing Waveforms for waveforms and definitions. 

(2) See Figure 22 for capacitive derating information for output delays and hold times. 

(3) See Figure 23 for capacitive derating information for rise and fall times. 

(4) Where N is the number of Nrap, Nrapp: Nwap; or Nwpp wait states that are programmed in the Bus Controller Region 
Table. When there are no wait states in an access, WAIT never goes active. 

(5) N = Number of wait state inserted with READY. 

(6) Output Data and/or DT/R may be driven indefinitely following a cycle if there is no subsequent bus activity. 

(7) See Notes 1, 2 and 3. 

(8) Since asynchronous inputs are synchronized internally by the 80960CA they have no required setup or hold times in 
order to be recognized and for proper operation. However, in order to guarantee recognition of the input at a particular rising 
edge of PCLK2:1 the setup times shown must be met. Asynchronous inputs must be active for at least two consecutive 
PCLK2:1 rising edges to be seen by the processor. 

(9) These specifications are guaranteed by the processor. 

(10) These specifications must be met by the system for proper operation of the processor. 

(11) This timing is dependent upon the loading of PCLK2:1. Use the derating curves of Figure 22 to adjust the timing for 
PCLK2:1 loading. ; 

(12) In the One-x input clock mode the maximum input clock period is limited to 125 ns while the processor is operating. 
When the processor is in reset, the input clock may stop even in One-x mode. 

(13) When in the One-x input clock mode, these specifications assume a stable input clock with a period variation of less 
than +0.1% between adjacent cycles. 

(14) This parameter is not tested. 

(15) Since asynchronous inputs are synchronized internally by the 80960CA, they have no required setup or hold times in 
order to be recognized and for proper operation. However, in order to guarantee recognition of the input at a particular 
falling edge of PCLK2:1 the setup times shown must be met. Asynchronous inputs must be active for at least two consecu- 
tive PCLK2:1 falling edges to be seen by the processor. 
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The AC Specifications in Section 4.5 are tested with 
the 50 pf load shown in Figure 15. See Figure 22 to 
oureu é see how timings vary with load capacitance. 
Specifications are measured at the 1.5V crossing 
point, unless otherwise indicated. Input waveforms 
are assumed to have a rise-and-fall time of < 2 ns 
from 0.8V to 2.0V. See Section 4.5.2, AC Timing 
Waveforms, for AC spec definitions, test points, 
and illustrations. 


CL= 50pf for all signals — <i — 270727-11 


Figure 15. AC Test Load 


4.5.2. AC TIMING WAVEFORMS 
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Figure 16b. CLKIN Waveform 
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Figure 18a. Input Setup and Hold Waveform 


— OUTPUT DELAY — The maximum output delay is referred to 


as the Output Valid Delay (Ty). The minimum output delay is 
referred to as the Output Hold (Toy). 


— OUTPUT FLOAT DELAY — The output float condition occurs 
when the maximum output current becomes less than I, 9 in magnitude. 


(ris) hn) — INPUT SETUP AND HOLD — The input setup and hold requirements 
specify the sampling window during which synchronous inputs must be 
stable for correct processor operation. 
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Figure 18b. RESET, NMI, XINT7:0 Input Setup and Hold Waveform 


3-201 


intel. | 80960CA-33, -25, -16 


“ PCLK2:1 
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_(A31:2, 031:0, BE3:0,_ W 

ADS, BLAST, WAIT, W/R, ny .5V VALID 
___DT/A, DEN, __ 
LOCK, D/C, SUP, DMA) 


— OUTPUT DELAY — The maximum output delay is referred to 
as the Output Valid Delay (Toy). The minimum output delay is 
- referred to as the Output Hold (Toy): 


— OUTPUT FLOAT.DELAY — The output float condition occurs 
when the maximum output current becomes less than |, 5 in magnitude. 


— INPUT SETUP AND HOLD — The input setup and hold requirements 
specify the sampling window during which synchronous inputs must be 


stable for correct processor operation: » 
Pee eee 270727-15 


Figure 19. Hold Acknowledge Timings 
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Figure 21. Relative-timings Waveforms 
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—=——= All outputs except: LOCK, DMA, SUP, 
BREQ, DACK3:0, EOP3:0/TC3:0, FAIL 


== =10CK, DMA, SUP, BREQ, DACK3:0, 
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OUTPUT VALID DELAYS (ns) @ 1.5V 
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NOTE: 
PCLK Load = 50 pF 


Figure 22. Output Delay or Hold vs Load Capacitance 
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(a) All outputs except: LOCK, DMA, SUP, HOLDA, BREQ, 
DACK3:0, EOP3:0/TC3:0, FAIL 
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Icc—Icc under test conditions 


Figure 24. icc vs Frequency and Temperature 


3-204 


intel. 


5.0 RESET, BACKOFF AND HOLD 
- ACKNOWLEDGE 


The following table lists the condition of each proc- 
essor output pin while RESET is asserted (low). 


Table 13. Reset Conditions 


| 
(HOLDA inactive)1 
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NOTE: 


knowledge state takes precedence over the reset state. Al- 


though asserting the RESET pin will internally reset the © 


processor, the processor’s bus output pins will not enter 
the reset state if it has granted Hold Acknowledge to a pre- 
vious HOLD request (HOLDA is active). Furthermore, the 
processor will grant new HOLD requests and enter the 
Hold Acknowledge state even while in reset. 

For example, if HOLDA is not active and the processor is 
in the reset state, then HOLD is asserted, the processor’s 
bus pins will enter the Hold Acknowledge state and 


HOLDA will be granted. The processor will not be able to — 


perform memory accesses until the HOLD request is re- 
-moved, even if the RESET pin is brought high. This opera- 
tion is provided to simplify boot-up synchronization among 
multiple processors sharing the same bus. 


pea . 


aera Floating (set to input mode) | 
| EOP/TCT | Floating (set to input mode) | 
| EOP/TCO | Floating (set to input mode) 


’ (1) With regard to bus output pin state only, the Hold Ac- 
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The following table lists the condition of each proc- 
essor output pin while HOLDA is asserted (low). 


Table 14. Hold Acknowledge 
and Backoff Conditions 
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Figure 26. Warm Reset Waveform | 
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Figure 27. Entering the ONCE™ State 
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NOTE: 
Case 1 and Case 2 show two possible polarities of PCLK2:1. 


Figure 28. Clock Synchronization in the 2x Clock Mode 


3-209 


intel. 80960CA-33, -25, -16 


Region Table Entry 


Bus | Pipe- | External 
Width lining |, Ready 
Control 
| | bto | 


aE 


‘0 0 Disabled] Disabled 
, OOOO } 00000 6) 0) 


A | D 


Uv 
“oO 
4! JF 
* 


> 
iw] 
n 


~~ 
; 


> <a 


oO 
m 
= 


fugue 
epett 


eye 


i, , 
>. <a 


> 
w 
av) 


/ 


“[ 
al 
aL 
L 
ane | 
BE 
~ 
L 
L 
| 


270727-26 


Figure 29. Non-Burst, Non-Pipelined Accesses without wait states 
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Figure 30. Non-Burst, Non-Pipelined Read with wait states 
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Figure 31. Non-Burst, Non-Pipelined Write with wait states 


3-212 


80960CA-33, -25, -16 


pis TS 


External 
Ready 
Control 


Pipe- 
lining 


Region Table Entry 
on ee 


bits 18-17 | bits 16-12] bits 11-10] bits 9-8 


Be) 
2 
Qa = 
[++] 
c 
iF 
es) 
2 
sO 
ss 
a 
© 
3S 
oS 
ie) 
(a) 
J: 
Pad 
< 
x 
- 


Bus 
idth 


= | 


parse 5 


Byte 
Order 


23 


bits 31 


031:0 | 
Figure 32. Burst, Non-Pipelined Read without wait states, 32-bit bus 
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Figure 33. Burst, 


Pipelined Read with wait states, 32-bit bus 
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Figure 34. Burst, Non-Pipelined Write without wait states, 32-bit bus 


3-215 


80960CA-33, -25, -16 


intel. 


~ 
he 
pari 
i= 
Lu 
QD . 
Q 
© 
~ 
c 
2 
5) 
a 
x 


r 
L 


SUP, DMA, 
D/C, LOCK 


A31:4, BE3:0 


270727-32 


Figure 35. Burst, Non-Pipelined Write with wait states, 32-bit bus 
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Figure 36. Burst, Non-Pipelined Read with wait states, 16-bit bus 
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Figure 38. Non-Burst, Pipelined Read without wait states, 32-bit bus 
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Figure 39. Non-Burst, Pipelined Read with wait states, 32-bit bus 
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Figure 40. Burst, Pipelined Read without wait states, 32-bit bus 
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Figure 41. Burst, Pipelined Read with wait states, 
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Figure 42 
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Figure 43. Burst, Pipelined Read with wait states, 8-bit bus 
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_ Figure 44. Using External READY 
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NOTE: 

READY adds memory access time to data transfers, whether or not the bus access is a burst access. BTERM interrupts 
a bus access, whether or not the bus access has more data transfers pending. Either the READY signal or the BTERM 
signal will terminate a bus access if the signal is asserted during the last (or only) data transfer of the bus access. 


Figure 45. Terminating a Burst with BTERM 
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Figure 46. BOFF Functional Timing 
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Figure 47. HOLD Functional Timing 


3-227 


intel. | - 80960CA-33, -25, -16 


pis Cae Pe Hae Bors Rin: Bian Scr ge Md 
! ! Cesare clock 


1 
' 
i ' bus request 
I 1 ) : 1 


; 
1(BLAST eo End of DMA 
& READY) ; : ;, bus request 
, | 


1 
1 
1 
1 acknowledge 
| 4 ' 
t ! ‘ 


DACKx 


(All modes) : 
= | - - 4 High to prevent | 
DREQx 1. 1 next DMA cycle 
(Case 1) a a ! is 
(Note 1) ‘ss ! ‘Hs ; High to sravant sat 
DREOx . I! ! 1 next DMA cycle ae 
(Case 2) RSX SASS 
(Note 2) ! 2 | ! ! | es tss ‘KS | 
| Bee ee 8 eae | a 
| 270727-70 


NOTES: 
1. Case 1: DREQ must deassert before DACK deasserts. Applications are Fly-by and some packing and unpacking 
modes, adjacent load-stores or store-loads, loads followed by loads, and stores followed by stores. 
2. Case 2: DREQ must be deasserted by the second clock (rising edge) after DACK is driven high. Applications are non 
fly-by transfers and adjacent load-stores or store-loads. 
3. DACKx is asserted for the duration of a DMA bus request. The request may consist of multiple bus accesses (defined 
by ADS and BLAST. Refer to User’s Manual for “access”, “request” definition. 


Figure 48. DREQ and DACK Functional Timing 
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NOTE: 

EOP has the same AC Timing Requirements as DREQ to prevent unwanted DMA requests. 

EOP is NOT edge triggered. EOP must be held fOr a minimum of 2 clock cycles then EOP must be deasserted 
within 15 clock cycles. 


Figure 49. EOP Functional Timing 
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NOTE: 

Terminal Count becomes active during the last bus request of a buffer transfer. If the last LOAD/STORE bus request is 
executed as multiple bus accesses, the TC will be active for the entire bus request. Refer to the User’s Manual for 
further information. a 


Figure 50. Terminal Count Functional Timing 
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Figure 51. FAIL Functional Timing 
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Figure 52. A Summary of Aligned and Unaligned Transfers for Little Endian Regions 
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Figure 53. A Summary of Aligned and Unaligned Transfers for Little Endian Regions (Continued) 
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i960™ WIC PROCESSOR 
PRODUCT OVERVIEW 


This chapter provides an overview of the architecture 
of the 1960 MC processor. 


The 1960 MC processor is the military-grade member of 
a new family of processors from Intel. This processor 
family is based on a new 32-bit architecture called the 
i960 architecture. The i960 architecture has been de- 
signed specifically to meet the needs of embedded appli- 
cations such as avionics, aerospace, weapons systems, 
robotics and instrumentation, where high reliability is 
critical. It represents a renewed commitment from Intel 
to provide reliable, high-performance processors and 
controllers for the embedded processor marketplace. 


The i960 architecture can best be characterized as a 
high-performance computing engine. It features high- 
speed instrumentation execution and ease of program- 
ming. It is also easily extensible, allowing processors 
and controllers based on this architecture to be conve- 
niently customized to meet the needs of specific pro- 
cessing and control appplications. 


Some of the important attributes of the 1960 architec- 
ture include: 


© full 32-bit registers 

® high-speed, pipelined instruction execution © 

© a convenient program execution environment with 
.32 general-purpose registers and a versatile set of 
special-function registers. 

© a highly optimized procedure call mechanism that 
features on-chip caching of local variables and pa- 
rameters 


o extensive facilities for handling interrupts and faults 


© extensive tracing facilities to support efficient pro- 
gram debugging and monitoring 


© register scoreboarding and write buffering to permit 
efficient operation with none performance memory 
subsystems 


The i960 MC processor implements the i960 architec- 
ture, plus it offers several extensions to the architecture. 
Some of these extensions, such as on-chip support for 
floating-point arithmetic, virtual memory management 
and multitasking, are designed to enhance overall sys- 
tem performance. Several other extensions are designed 
to enhance system reliability and robustness. These ex- 
tensions include facilities for hardware enforced protec- 
tion of software modules and for creating fault tolerant 
systems through the use of redundant processors. 
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The following sections describe those features of the 
i960 architecture that are provided to streamline code 
execution and simplify programming. The extensions to 
this architecture provided in the i960 MC processor are 
described at the end of the chapter. 


HIGH PERFORMANCE PROGRAM 
EXECUTION 


Much of the design of the 1960 architecture has been 
aimed at maximizing the processor’s computational 
and data processing speed through increased parallel- 
ism. The following paragraphs describe several of the 
mechanisms and techniques used to accomplish this 
goal, including: 


© an efficient load and store memory-access model 
° caching of code and procedural data 
© overlapped execution of instructions 


©. many one or two clock-cycle instructions 


Load and Store WNodel 


One of the more important features of the 1960 archi- 
tecture is that most of its operations are performed on 
operands in registers, rather than in memory. For ex- 
ample, all the arithmetic, logical, comparison, branch- 
ing and bit operations are performed with peels and 
literals. 


This feature provides two benefits. First, it increases 
program execution speed by minimizing the number of 
memory accesses required to execute a program. Sec- 
ond, it reduces memory latency encountered when us- 
ing slower, lower-cost iemory parts. - 7 


To suppor this concept, the architecture provides a 
generous supply of general-purpose registers. For each 
procedure, 32 registers are available (28 of which are 
available for general use). These registers are divided 
into two types: global and local. Both. these types of 
registers can be used for general storage of operands. 
The only difference is that global registers retain their 
contents across procedure boundaries, whereas the 
processor allocates a new set of local registers each time © 
a new procedure is called. 
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The architecture also provides a-set of fast, versatile 
load and store instructions. These instructions allow 
burst transfers of 1, 2, 4, 8, 12 or 16 bytes of informa- 
tion between memory and the registers. 


On-Chip Caching of Code and Data 


To further reduce memory accesses, the architecture 
offers two mechanisms for caching code and data on 
chip: an instruction cache and multiple sets of local 
registers. The instruction cache allows prefetching of 
blocks of instruction from memory, which helps insure 
that the instruction execution pipeline is supplied with 
a steady stream of instructions. It also reduces the 
number of memory accesses required when performing 
iterative operations such as loops. (The size of the in- 
struction cache can vary. uh the i960 MC processor, 
it is 512 bytes.) 


_ To optimize the architecture’s procedure call mecha- 
“nism, the processor provides multiple sets of local regis- 
' ters. This allows the processor to perform most proce- 
dure calls without having to write the local registers out 
to the stack in memory. 


_ (The number of local- register sets provided depends on 
the processor implementation. The i960 MC processor 
provides four sets of local registers.) 


Overlapped Instruction Execution - 


Another technique that the 1960 architecture anion 
to enhance program execution speed is overlapping the 
execution of some instructions. This is accomplished 
through two mechanisms: register scorcvoarding and 
branch Epcaicaen: 


Register ae roeaidine permits instruction execution to 
continue while data is being fetched from memory. 
When a load instruction is executed, the processor sets 
one or more scoreboard bits to indicate the target regis- 
ters to be loaded. After the target registers are loaded, 
the scoreboard bits are cleared. While the target regis- 
ters are being loaded,:the processor is allowed to exe- 
cute other instructions that do not use these registers. 
The processor uses the scoreboard bits to insure that 
target registers are not used until the loads are com- 
plete. (The checking of scoreboard bits is transparent to 
software.) The net result of using this technique is that 
- code can often be optimized in such a way as to allow 
some instructions to be executed parallel. 
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Single-Clock Instructions 


It is the intent of the i960 architecture that a processor 
be able to execute commonly used instructions such as 
move, add, subtract, logical operations, compare and 
branch in a minimum number of clock cycles (prefer- 
ably one clock cycle). The architecture supports this 
concept in several ways. For example, the load and 
store model described earlier in this chapter (with its 
concentration on register-to-register operations) allows 
simple operations to be performed without the over- 
head of memory- -to-memory operations. | 


Also, all the instructions in thé i960 architecture are 
32 bits or 64 bits long and aligned on 32-bit boundaries. 
This feature allows instructions to be decoded in one 
clock cycle. It also eliminates the need for an instruc- 
tion-alignment stage in the pipeline. © 


The design of the 1960 MC processor takes full advan-- 
tage of these features of the architecture, resulting in 
more than 50 instructions that can be executed in ‘a 
single clock- cycle. 


Efficient Interrupt Niodel 


The i960 architecture provides an efficient mechanism 
for servicing interrupts from external sources: To han- 
dle interrupts, the processor maintains an interrupt ta- 
ble of 248 interrupt vectors (240 of which are available 
for general use). When an interrupt is signaled, the 
processor uses a pointer from the interrupt table to per- 
form an implicit call to an interrupt handler procedure. 
In performing this call, the processor automatically 
saves the state of the processor prior .to receiving the 
interrupt; performs the interrupt routine; and: then re- 
stores the state of the processor. A separate interrupt 
stack is also provided to segregate interrupt handling 
from application programs. 


The interrupt handling facilites also feature a method — 
of prioritizing interrupts. Using this technique, the 
processor is able to store interrupts that are lower in 
priority than the task the processor is currently work- 
ing on in a pending interrupt section of the interrupt 
table. At certain defined times, the processor checks the 
pending meuURs 2 and services them. 
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SIMPLIFIED PROGRAMMING 
ENVIRONMENT 


Partly as a side benefit of its streamlined execution en- 
vironment and partly by design, processors based on 
the 1960 architecture are particularly easy to program. 
For example, the large number of general-purpose reg- 
isters allows relatively complex algorithms to be execut- 
ed with a minimum number of memory accesses. The 
following paragraphs describe some of the other fea- 
tures that simplify programming. 


Highly Efficient Procedure Call 
Mechanism 


The procedure call mechanism makes procedure calls 
and parameter passing between procedures simple and 
compact. Each time a call instruction is issued, the 
processor automatically saves the current set of local 
registers and allocates a new set of local registers for 
the called procedure. Likewise, on a return from a pro- 
cedure, the current set of local registers is deallocated 


and the local registers for the procedure being returned — 


to are restored. On a procedure call, the program thus 
never has to explicitly save and restore those local vari- 
ables and parameters that are stored in local registers. 


Versatile Instruction Set and 
Addressing 


The selection of instructions and addressing modes also. 


simplifies programming. The architecture offers a full 
set of load, store, move, arithmetic, comparison and 
branch instructions, with operations on both integer 
and ordinal data types. It also provides a complete set 
of Boolean and bit-field instructions, to simplify opera- 
tions on bits and bit strings. 


The addressing modes are efficient and straightforward, 
while at the same time providing the necessary indexing 
and scaling modes required to address complex arrays 
and record structures. 


The large 4-gigabyte address space provides ample 
room to store programs and data. The availability of 32 
addressing lines allows some address lines to be memo- 
ry-mapped to control hardware functions. 


Extensive Fault Handling Capability 


To aid in program development, the 1960 architecture 
defines a wide selection of faults that the processor de- 
tects, including arithmetic faults, invalid operands, in- 
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valid operations and machine faults. When a fault is 
detected, the processor makes an implicit call to a fault 
handler routine, using a mechanism similar to that de- 
scribed above for interrupts. The information collected 
for each fault allows program developers to quickly 
correct faulting code. It also allows automatic recovery 
from some faults. 


Debugging and Monitoring 


To support debugging systems, the 1960 architecture 
provides a mechanism for monitoring processor activity 
by means of trace events. The processor can be config- 
ured to detect as many as seven different trace events, 
including branches, calls, supervisor calls, returns, pre- 
returns, breakpoints and the execution of any instruc- 
tion. When the processor detects a trace event, it sig- 
nals a trace fault and calls a fault handler. Intel pro- 
vides several tools that use this feature, including an in- 
circuit emulator (ICE™) device.. 


SUPPORT FOR ARCHITECTURAL 
EXTENSIONS 


The i960 architecture described earlier in this chapter 
provides a high-performance computing engine for use 
as the computational and data-processing core of em- 
bedded processor or controllers. The architecture also 
provides several features that enable processors based 
on this architecture to be easily customized to meet the 
needs of specific embedded applications, such as signal 
processing, array processing or graphics processing. 


The most important of these features is a set of 32 spe- 
cial-function registers. These registers provide a conve- 
nient interface to circuitry in the processor or to pins 
that can be connected to external hardware. They can 
be used to control timers, to perform operations on spe- 
cial data types or to perform I/O functions. 


The special-function registers are similar to the global 
registers. They can be addressed by all the register-ac- 
cess instructions. 


EXTENSIONS INCLUDED IN THE 
80960NC PROCESSOR 


The extensions to the 1960 architecture included in the 
i960 MC processor are built on top of the processor’s 
core computing engine. These extensions are aimed at 
improving the efficiency and reliability of embedded 
systems. 
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On-Chip Floating Point 


The i960 MC. processor provides a complete implemen- 
tation of the IEEE standard for binary floating-point 
arithmetic (IEEE 754-185). This implementation in- 
cludes a full set of floating-point operations, including 
add, subtract, multiply, divide, trigonometric functions 
and logarithmic functions. These operations are per- 
formed on single precision (32-bit), double precision 
(64-bit) and extended precision (80-bit) real numbers. 


One of the benefits of this implementation is that the 
floating-point handling facilities are completely inte- 
grated into the normal instruction execution environ- 
ment. Single- and double-precision floating-point values 
are stored in the same registers as non-floating point 
values. Also, four 80-bit floating- “point registers are pro- 
vided to hold extended- -precision values. 


String and Decimal Operations 

The 1960 MC processor provides several instructions 
for moving, filling and comparing byte strings in mem- 
ory. These instructions speed up string operations and 
reduce the amount of code required to handle strings. 


The decimal instructions perform move, add with carry 
and subtract. with carry operons on. unary < coded 
decimal OeY) strings. 


Virtual-Memory Support : 


Another of the i960 MC processor’s important features 
is support for virtual-memory management. When us- 
ing the processor in virtual-memory mode, the proces- 
sor. provides each process (or task) with an address 
space of up to 232 bytes. This address space is paged 
into physical memory in 4 Kbyte pages. On-chip mem- 
ory-management facilities handle virtual-to-physical 
address translation. A translation look-aside buffer 
(TLB) speeds address translation by storing virtual-to- 
physical address translations for frequently accessed 
parts of memory, such as the location of the page tables 
and the location of often used system data structures. 
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Protection | 


The 1960 MC processor offers two mechanisms for pro- 
tecting critical data structures or software modules. 
The first is the ability to use page rights bits to restrict 
access to individual pages. Page rights allow various 
levels of access to be assigned to a,page, ranging from 
no access to read only to read-write. 


The second protection mechanism is a user/supervisor 
protection model. This two-level protection model pro- 


_ vides hardware enforced protection of kernel proce- 


dures and data structures. When using this protection 
mechanism, priviledged procedures and data are placed 
in protected pages of memory. These pages can then be 
accessed only through a procedure table, which pro- 
vides a tightly controlled interface to kernel functions. 


Multitasking - 

The i960 MC processor offers a variety of process man- 
agement facilities to support concurrent execution of 
multiple tasks. These facilities can be divided into two 


groups: process Poe and interprocess communi- 
cations. 


The process scheduling facilities consist of a set of gen- 
eral-purpose data structures and instructions, which are 
designed -to support several different multitasking 
schemes. For example, the processor provides a set of 
instructions that allow the kernel to explicitly dispatch 
a task (bind it to the processor) and to suspend a task 
(save the current state of a task so that another. task can 
be bound to the processor). These instructions can be 
used within kernel procedures to. schedule, dispatch 
and preempt multiple tasks. < 


The processor also provides a unique feature called self 
dispatching. Here, the kernel schedules tasks by queu- 
ing them to a dispatch port. Thereafter, the processor 


handles the dispatching, preempting and rescheduling 


of the tasks automatically, independent of the kernel. 


When using this mechanism, tasks can be scheduled by ° 


priority, with up to 32 priority levels to choose from. 
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The processor’s interprocess communication facilities 


include support for semaphores and communication 
ports. These facilities allow synchronization of interde- 
pendent tasks and asynchronous communication be- 
tween tasks. 


Multiprocessing 


The i960 MC processor provides several mechanisms 
designed to simplify the design of multiple-processor 
systems, allowing several processors to run in parallel, 
using shared memory resources. One of these mecha- 
nisms is the self-dispatching capability described above. 
Here, two or more processors can schedule and dis- 
patch processes from a single dispatch port, with each 
processor equally sharing the processing load. 


The processor also provides an inter-agent communica- 
tion (AC) mechanism that allows processors to ex- 
change messages among themselves on the bus. This 
mechanism operates similarly to the interrupt mecha- 
nism, except that IAC messages are passed through 
dedicated sections of memory. The IAC mechanism 
can be used to preempt processes running on another 
processor, to manage interrupt handling or to initialize 
and synchronize several processors. _ | 


A set of atomic instructions are also provided to syn- 
chronize memory accesses. Multiple processors can 
then access shared memory without inserting inaccura- 
cies and ambiguities into shared data structures. 
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Fault Tolerance 


The 1960 family of components supports fault-tolerant ° 
system design through the use of the M82965 Bus Ex- 
tension Unit component. The M82965 allows two proc- 
essors to be operated in tandem to form a self-checking 
module. The two M82965s check the outputs of two 
processors (a master and a checker) cycle-by-cycle. If 
the checking M82965 detects a difference between out- 
puts, it signals an error. A software recovery procedure 
can then be initiated. 


This fault detection mechanism supports several fault 
detection and recovery techniques, including self heal- 
ing, and continuous-operation (non-stop) systems. 


LOOK FOR MORE IN THE FUTURE 


The i960 architecture offers exceptional performance, 
plus a wealth of useful features to help in the design of 
efficient and reliable embedded systems. But equally 


important, it offers lots of room to grow. The i960 MC s | 
processor provides average instruction processing rates fib ilama 


of 7.5 million instructions per second (7.5 MIPS) at ° 
20 MHz clock rate and 10 MIPS at a 25 MHz clock 
rate(!), : 7 ; 


However, the 1960 MC processor is only the beginning. 
With improvements in VLSI technology, future imple- 
mentations of the i960 architecture will offer even 
greater performance. They will also offer a variety of 
useful extensions to solve specific control and monitor- 
ing needs in the field of embedded applications. 


1. 1 MIP is equivalent to the performance of a Digital Equipment Corp. VAX 11/780. 
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80960MC 
EMBEDDED 32-BIT MICROPROCESSOR 
WITH INTEGRATED FLOATING-POINT UNIT 
_ AND MEMORY MANAGEMENT UNIT 


Military 
m High-Performance Embedded ' : m On-Chip Memory Management Unit 
Architecture - — 4 Gigabyte Virtual Address Space 
— 25 MIPS Burst Execution at 25 MHz _ per Task 
— 9.4 MIPS* Sustained Execution at — 4 Kbyte Pages with Supervisor/User 
25 MHz Protection 
m On-Chip Floating-Point Unit @ Built-In Interrupt Controller 
_ Supports IEEE 754 Floating- -Point | — 32 Priority Levels 
Standard . — 248 Vectors. 
— Full Transcendental Support —- Supports M8259A 
— Four 80-Bit Registers — 3.4 us Latency 
— 5.2 Million Whetstones/Second at m Easy to Use, High Bandwidth 32-Bit Bus 
20 Miz | — 66.7 MBytes/s Burst 
m 512-Byte On-Chip Instruction Cache | —— Up to 16-Bytes Transferred per Burst 
— Direct Mapped . : : a : 
— Parallel Load/Decode for Uncached - th ehesring anchmuuproeess ot 
Support 
Instructions | _ — Automatic Task Dispatching 
_m Multiple Register Sets : — Prioritized Task Queues | 


— Sixteen Global 32-Bit Registers 


— Sixteen Local 32-Bit Registers m Advanced Package Technology 


— Four Local Register Sets Stored _ 7 132 Lead Ceramic Pin Grid Array 
On-Chip (Sixteen 32-Bit Registers — 164 Lead Ceramic Quad Flatpack . 


per Set) | @ Military Temperature Range : 
— Register Scoreboarding - ‘wy 5 — —55°C to + 125°C (Tc) 


- The 80960MC is the enhanced military member of Intel’s new 32-bit microprocessor family, the 960 series, 


which is designed especially for embedded applications. It is based on the family’s high performance, com- 
mon core architecture, and includes a 512-byte instruction cache, a built-in interrupt controller, an integrated 
floating-point unit and a memory management unit. The 80960MC has a large register set, multiple parallel 
execution units, and a high-bandwidth, burst bus. Using advanced RISC technology, this high performance 
processor can respond to interrupts in under 3.4 ws and is capable of execution rates in excess of 9.4 million 
instructions per second.* The 80960MC is well-suited for a wide range of military and other high reliability 
applications, including avionics, airborne radar, navigation, and instrumentation. 


*Relative to Digital Equipment Corporation’s VAX-11/780** at 1 MIPS 
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Figure 1. The 80960MC’s 5 Highly F Parallel Microarchitecture 
— **VAX- eM is a trademark of Digital Equipment Corporation. 
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THE 960 SERIES 


The 80960MC is the enhanced military member of a 
new family of 32-bit microprocessors from Intel 
known as the 960 Series. This series was especially 
designed to serve the needs of embedded applica- 
tions. The embedded market includes applications 
as diverse as industrial automation, avionics, image 
processing, graphics, robotics, telecommunications, 
and automobiles. These types of applications re- 
quire high integration, low power consumption, quick 
interrupt response times, and high performance. 
Since time to market is critical, embedded micro- 
processors need to be easy to use in both hardware 
and software designs. 


SIXTEEN | 
32-BIT 
REGISTERS 


GLOBAL 
REGISTERS() 
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All members of the 80960 series share a common 
core architecture which utilizes RISC technology so 
that, except for special functions, the. family mem- 
bers are object code compatible. Each:new proces- 
sor in the series will add its own special set of func- 
tions to the core to satisfy the needs of a specific 
application or range of applications in the embedded 
market. For example, future processors may include 
a DMA controller, a timer, or-an A/D converter. . _ 


The 80960MC includes an integrated Floating Point 
Unit (FPU), a Memory Management Unit (MMU), 
multitasking support, and multiprocessor support. 
There are also two commercial members of the fam- 
ily: the 80960KB processor with integrated FPU and 
the 80960KA without floating-point. | . 


FLOATING- 


FOUR 80-BIT REGISTERS 
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32-BIT 
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LOCAL 
REGISTERS(2) 


32-BITS 


32-BITS INSTRUCTION POINTER 


32-BITS PROCESS CONTROLS 


32-BITS TRACE CONTROLS 


NOTES: 


1. Register g15 is reserved for stack management functions. 


ARITHMETIC CONTROLS | 


POINT | 
REGISTERS 


ADDRESS 
SPACE 


2. Registers r0, r1, and r2 are reserved for stack management functions. 


Figure 2. Register Set 
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KEY PERFORMANCE FEATURES so that execution speed can be greatly improved by 


| ensuring that these core instructions execute in as 
The 80960MC’s architecture is based on the most short a time as possible. The most-frequently exe- 


recent'advances in RISC technology and is ground- cuted instructions such as register-register moves, 
ed in Intel’s long experience in designing embedded add/subtract, logical operations, and shifts execute — 
controllers. Many features contribute to the in one to two cycles (Table 1 contains a list of in- 
80960MC’s exceptional performance: structions.) | | 

1. Large Register Set. Having a large number of 3, Load/Store Architecture. Like other processors 
registers reduces the number of times that a proces- based on RISC technology, the 80960MC has a 
sor needs to access memory. Modern compilers can Load/Store. architecture, only the LOAD and STORE © 
take advantage of this feature to optimize execution instructions reference memory; all other instructions 
speed. For maximum flexibility, the 80960MC pro- operate on registers. This type of architecture simpli- 
vides thirty-two 32-bit registers (sixteen local and fies instruction decoding and is used in combination 
sixteen global) and four 80-bit floating-point global with other techniques to increase parallelism. | 


registers. (See Figure 2.) 


2..Fast Instruction Execution. Simple functions 
make up the bulk of instructions in most programs, 


Control Opcode Displacement 


Compare 
and Branch | 


Register 


Opcode Reg/Lit 
to Register x i 


Memory | 


Memory 
Access—Long : 


Figure 3. Instruction Formats 
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Table 1. 80960MC Instruction Set 


Floating 


Load 

Store 

Move 

Load Address 

Load Physical 
Address 


Bit and 


Compare 
Conditional 
Compare 
Compare and 
Increment 
Compare and ° 
Decrement 


Add . Add 
Subtract Subtract 
Multiply Multiply 
Divide Divide 
Remainder Remainder 
Modulo Scale 
Shift Round 
Square Root 
Sine 
Cosine 
Tangent 
Arctangent 
Log 
Log Binary 
Log Natural 
Exponent 
Classify 
Copy Real 
Extended 
Compare 


Unconditional Set Bit 
Branch Clear Bit 
Conditional Branch Not Bit 
Compare and _ Check Bit 
Branch Alter Bit 
Scan for Bit 
Scan over Bit 
Extract 
Modify 


And 

Not And 
And Not 

Or 
Exclusive Or 


Not Or 


Or Not 

Nor 

Exclusive Nor 
Not... 

Nand 

Rotate 


Move String 
Move Quick String 
Fill String 
Compare String 
Scan Byte for 

. Equal 


Process 
ver 


Convert Real to 
Integer 

Convert Integer to 
Real 


| Fait Debug | Miscellaneous 


Conditional Fault 
Synchronize Faults 


Move Call 

Add with Carry Call Etended 

Subtract with Carry _ Call System 
Return 


Branch and Link 


Modify Trace Flush Local 
Controls Registers 
Mark Inspect Access 


Force Mark Modify Arithmetic 


Controls 
Test Condition 
Code: 
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Schedule Process 

Saves Process 

Resume Process. 

Load Process Time 

Modify Process 
Controls 

Wait 

Conditional Wait 

Signal 

Receive 

Conditional 
Receive . 

Send 

Send Service 

Atomic Add 

Atomic Modify 
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4. Simple Instruction Formats. All instructions: in 
the 80960MC are 32-bits long and must be aligned 


~ on word boundaries. This alignment makes it possi- 


ble to eliminate the instruction-alignment stage in 
the pipeline. To simplify the instruction decoder fur- 
ther, there are only five instruction formats and each 
instruction uses only one format. (See Figure 3.) 


5. Overlapped Instruction Execution. A load oper- 
ation allows execution of subsequent instructions to 
continue before the data has been returned from 
memory, so that these instructions can overlap the 
load. The 80960MC manages this process transpar- 
ently to software through the use of a register score- 
board. Conditional instructions also make use of a 
scoreboard so that subsequent unrelated instruc- 
tions can be executed while the eoneone’ instruc- 
tion is pending. 


6. Integer Execution Optimization. When the re- 
sult of an operation is used as an operand in a.sub- 
- sequent calculation, the value is sent immediately to 
its destination register. Yet at the same time, the 
value is put back on a bypass path to the ALU, 
thereby saving the time that otherwise would be re- 
quired to retrieve the value for the next operation. 


7. Bandwidth Optimizations. The 80960MC gets 


optimal use of its memory bus bandwidth because — 
the bus is tuned for use with the cache: the line size _ 
of the instruction cache matches the maximum burst — 
size for instruction fetches. The 80960MC automati-. 


cally fetches four words in a burst and stores them 
directly in‘ the.cache. Due to the size of the cache 
and the fact that it is continually filled in anticipation 
of needed instructions in. the program flow, the 


80960MC is exceptionally insensitive to: memory | 


wait states. In fact; each wait state causes only a 
7% degradation in system perfomance. The benefit 
is that the 80960MC will deliver outstanding per- 
formance even with a low cost memory system. 


8. Cache Bypass. if there is a cache miss, the proc- 
essor fetches. the needed instruction, then sends it 
on to the instruction decoder at the same time it 


updates the cache. Thus, no extra time is taken to” 


load and read the cache. 


Memory Space and Addressing Nodes 


The 80960MC allows each task (process) ‘to ad- — 
dress a logical memory space of up to 4 Gbytes. In 


turn, each task’s address space is divided into four 
1-Gbyte regions and each region can be mapped to 
physical addresses by zero, one, or two levels of 


page tables. The region with the highest addresses _ 


(Region 3) is common to all tasks. 
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_ In keeping with RISC design principles, the number 


of addressing modes has been kept to a minimum 
but includes all those necessary to ensure efficient 
execution of high-level languages such as Ada, C,. 
and Fortran.. Table 2 lists the memory addressing 
modes. 


Data Types 
The 80960MC recognizes the following data types: 


Numeric: 

© 8-, 16-, 32- and 64-bit ordinals 
° 8-, 16, 32- and 64-bit integers 

© 32-, 64- and 80-bit real numbers 


Non-Numeric: 

° Bit 

© Bit Field 

© Triple-Word (96 bits) 
© Quad-Word (128 bits) 


Large Register Set 


The programming environment of the 80960MC in- 
cludes a large number of registers. In fact, 36 regis- 


~ ters are available at any time. The availability of this 


many registers greatly reduces the number of mem- 
ory accesses required to execute most programs, 
which leads to greater instruction processing speed. 


There are two types of general-purpose registers: 


local and global. The 20 global registers consist of 


sixteen 32-bit registers (GO through G15) and four 
80-bit registers (FPO through FP3). These registers 


perform the same function as the general-purpose 


registers provided in other popular microprocessors. 
The term global refers to the fact that these regis- 
ters retain their contents across procedure calls. 


The local registers, on the other hand, are proce- 
dure specific. For each procedure call, the 830960MC 
allocates 16 local registers (RO through R15). Each 
local register is 32 bits wide. Any register can also 
be used for floating-point operations; the 80-bit float- 
ing- point igetetes are Pee for extended preci- 
sion. 


Multiple Register Seis 


To further increase the efficiency of the register set, 
multiple sets of local registers are stored on-chip. 
This cache holds up to four local register frames, 
which means that up to three procedure calls can be 
made without having to access the procedure stack 
resident in memory. 
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12-Bit Offset 

32-Bit Offset 

Register-Indirect 

Register + 12-Bit Offset 

Register + 32-Bit Offset 

Register + (Index-Register < Scale-Factor) 


Register x Scale Factor + 32-Bit Displacement 
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Register + (Index-Register < Scale-Factor) + 32-Bit Displacement 


Scale-Factor is 1, 2, 4, 8 or 16 


Although programs may have procedure calls nest- 
ed many calls deep, a program typically oscillates 
back and forth between only two or three levels. As 
a result, with four stack frames in the cache, the 
probability of there being a free frame on the cache 
when a call is made is very high. In fact, runs of 
representative C-language programs show that 80% 
of the calls are handled without needing to access 
memory. | a ee 


lf there are four or more active procedures and a 
new procedure is called, the processor moves the 
oldest set of local registers in the register cache to a 


REGISTER 
CACHE 


ONE OF FOUR 
LOCAL 
REGISTER SETS 


Figure 4. Multiple Register Sets Are Stored On-Chip 


procedure stack in memory to make room for a new 
set of registers. Global register G15 is used by the 
processor as the frame pointer (FP) for the proce- 
dure stack. 


Note that the global and floating-point registers are 
not exchanged on a procedure call, but retain their 
contents, making them available to all procedures 
for fast parameter passing. An illustration of the reg- 
ister cache is shown. in Figure 4. 
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Instruction Cache 


To further reduce memory accesses, the 80960MC 
includes a 512-byte on-chip instruction cache. The 
instruction cache is based on the concept of locality 
of reference; that is, most programs are not usually 
executed in a steady stream but consist of many 
branches and loops that lead to jumping back and 
forth within the same small section of code. Thus, by 
maintaining a block of instructions in a cache, the 
number of memory references required to read in- 


structions into the processor can be greatly reduced. 


To load the instruction cache, instructions are 
fetched in 16-byte blocks, so that up to four instruc- 
tions can be fetched at one time. An efficient 
prefetch algorithm increases the probability that an 
instruction will already be in cone pacts when it is 
needed. 7 


Code for small loops will afin fit entirety within the 
- cache, leading to a great increase in processing 
speed since further memory references might not be 
necessary until the program exits the loop. Similarly, 
when calling short procedures, the code for the call- 
ing procedure is likely to remain in the eacne: So it 
will be there on the procedure’s return. | 


Register Cacti 


. The instruction decoder has been optimized in sev- 
eral ways. One of these optimizations is the ability to 
do instruction overlapping by means of register 
scoreboarding. 


- Register scoreboarding occurs when a LOAD in- 
struction is executed to move a variable from memo- 
ry into a register. When the instruction is initiated, a 


scoreboard bit on the target register is set. When the ~ 


register is actually loaded, the bit is reset. In be- 
tween, any reference to the register contents is ac- 
companied by a test of the scoreboard bit to insure 
that the load has completed before processing con- 
tinues. Since the processor does not have to wait for 
the LOAD to be completed, it can go on to execute 
additional instructions placed in between the LOAD 
instruction and the instruction that uses the register 
contents, as shown in the following example: 


LOAD R4, address 1 
LOAD R85, address 2 
Unrelated instruction 
Unrelated instruction 
ADD R4, R5, R6 


In essence, the two unrelated instructions between 
the LOAD and ADD instructions are executed for 
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free (i.e., take no apparent time to execute) because 
they are executed while the register is being loaded. 


Up to three LOAD instructions can be pending at 
one time with three corresponding scoreboard bits 
set. By exploiting this feature, system programmers 
and compilers have a useful tool for optimizing exe- 
cution speed. | 


Memory Management and Protection 


The 80960MC will be especially useful for multitask- 
ing applications that require software protection and 
a very large address space. To ensure the highest 
level of performance possible, the memory manage- 
ment unit and translation look- aside buffer (TLB) are 
contained on-chip. 


The 80960MC suppoits a écrvantional form of de- 
mand-paged virtual memory in which the address 
space is divided into 4 Kbyte pages. Studies have 
shown that a 4 Kbyte page is the ree size for é a 
broad fange of epmcalions: eas , 


Each page table entry includes ¢ a 2- bit page Slants 
field that specifies whether the page is a no-access, 
read-only, or read-write page. This field is interpret- 
ed differently depending on whether the current task 
(process) is executing in in user OF SUD eI Sor mode, as 
shown below: a te | 


Rights User 


Supervisor i 
00 ' No Access Read-Only 
01 No Access Read-Write 
10 Read-Only Read-Write 
he Read-Write 


Read-Write 


Floating-Point Arithmetic 


In the 80960MC, floating-point arithmetic has been 
made an integral part of the architecture. Having the 
floating-point unit integrated on-chip provides two 
advantages. First, it improves the performance of 
the chip for floating-point applications, since no 
additional bus overhead is associated with floating- 
point calculations, thereby leaving more time for oth- 
er bus operations such as I/O. Second, the cost of 
using floating-point operations is reduced because a 
separate coprocessor chip is not required. 


The 80960MC floating-point (real number) data 
types include single-precision (32-bit), double-preci- 
sion (64-bit), and extended precision (80-bit) float- 
ing-point numbers. Any register may be used to exe- | 
cute floating-point Sperauons: 


3-244 


intel. 


The processor provides hardware support for both 
mandatory and recommended portions of IEEE 
Standard 754 for floating-point arithmetic, including 
all arithmetic, exponential, logarithmic, and other 
transcendental functions. Table 3 shows execution 
times for some representative instructions. 


a 3. Sample Floating-Point Execution 
Times (us) at 25 MHz 


32-Bit 64-Bit 


Add 
Subtract 
Multiply 
Divide 
Square Root 
Arctangent 
Exponent 
Sine 

Cosine 


Multitasking Support 


Multitasking programs commonly involve the moni- 
toring and control of an external operation, such as 
the activities of a process controller or the move- 
ments of a machine tool. These programs generally 


consist of a number of processes that run indepen- | 


dently of one another, but share a common data- 
base or pass data among themselves. 


The 80960MC offers several hardware functions de- 
signed to support multitasking systems. One unique 
feature, called self-dispatching, allows a processor 


to switch itself automatically among scheduled 


tasks. When self-dispatching is used, all the operat- 
ing system is required to do is place the task in the 
scheduling queue. 


When the processor becomes available, it dis- 
patches the task from the beginning of the queue 
and then executes it until it becomes blocked, inter- 
rupted, or until its time-slice expires. It then returns 
the task to the end of the queue (i.e., automatically 
reschedules it) and dispatches the next ready task. 
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During these operations, no communication be- 
tween the processor and the operating system is 
necessary until the running task is complete or an 
interrupt is issued. 


Synchronization and Communication 


The 80960MC also offers instructions to set up and 
test semaphores to ensure that concurrent tasks 
remain synchronized and no data inconsistency 
results. Special data structures, known as communi- 
cation ports, provide the means for exchanging 
parameters and data structures. Transmission of in-- 
formation by means of communication ports is asyn- 
chronous and automatically buffered by the proces- 
sor. | 


Communication between tasks by means of ports 
can be carried out independently of the operating 
system. Once the ports have been set up by the 
programmer, the processor handles the message 
passing automatically. 


High Bandwidth Local Bus 


An 80960MC CPU resides on a high-bandwidth aide 
dress/data bus known as the local bus (L-Bus). The 
L-Bus provides a direct communication path be- 
tween the processor and the memory and |/O sub- 
system interfaces. The processor uses the local bus 
to fetch instructions, manipulate memory, and re- 
spond to interrupts. Its features include: 


© 32-bit multiplexed address/data path 


° Four-word burst capability, which allows transfers 
from 1 to 16 bytes at a time 


° High bandwidth reads and writes at 66.7 “MBytes 
_ per second 


° Special signal to indicate whether a memory 
transaction can be cached 


Figure 5 identifies the groups of signals which con- 
stitute the L-Bus. Table 4 lists the function of the L- 
Bus and other processor-support signals, such as 
the interrupt lines. 
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| “LOCAL BUS : 
LOCAL BUS SIGNAL GROUPS 


\ 


ADDRESS/DATA (32 LINES) 


CONTROL (ADDRESS,DATA, and OPERATION SIGNALS = 15 LINES) 


ARBITRATION (2 LINES) — 
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Figure 5. Local Bus Signal Groups 


Multiple Processor Support | 


One means of increasing the processing power of a 
system is to run two or more processors in parallel. 
Since microprocessors are not generally designed to 
run in tandem with other processors, designing eace 
a system is usually difficult and cosily. 


The 80960MC Solves this srobient by offering a 
number of functions to coordinate the actions of 
multiple processors. First, messages can be passed 
between processors to initiate actions such as flush- 
ing a cache, stopping or starting another processor, 
or preempting a task. The messages are passed on 
the bus and allow multiple processors to run togeth- 
er smoothly, with rare need to lock the bus or memo- 


ry. 


‘Second, a set of synchronization instructions help | 


maintain the coherency of memory. These instruc- 
tions permit several processors to modify memory at 
the same time without inserting inaccuracies or am- 
biguities into shared data structures. 


The self-dispatching mechanism, in addition to being 
used in single-processor systems, provides the 
means to increase the performance of a system 
merely by adding processors. Each processor can 
either work on the same pool of tasks (sharing the 
same queue with other processors) or can be re- 
stricted to its own queue. 


When processors. perform system operations, they 
synchronize themselves by using atomic operations 
and sending special messages between each other. 
_ And changing the number of processors in a system 


never requires a software change. Software will exe- 
cute correctly regardless of the number of proces- 
sors in the system; systems with more processors 
simply execute faster. 


interrupt Hanaling | 


The 80960MC can be interrupted in one of two 
ways: by the activation of one of four interrupt pins 
or by sending a message on the processor's Gala 
bus. 


The 80960MC is unusual in that it automatically han- 
dles interrupts on a priority basis and tracks pending 
interrupts through its on-chip interrupt controller. 
Two of the interrupt pins can be configured to pro- 
vide M8259A handshaking for expansion beyond 
four interrupt lines. 


An interrupt message is made up of a vector number 
and an interrupt priority. If the interrupt priority is 
greater than that of the currently running task, the 


processor accepts the interrupt and uses the vector 


as an index into the interrupt table. If the priority of 
the interrupt message is below that of the current 
task, the processor saves the information in a sec- 
tion of the interrupt table reserved for Pe inter- 
rupts. 


Debug Features 


The 80960MC has built-in debug capabilities. There 
are two types of breakpoints and six different trace 
modes. The debug features are controlled by two 
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internal 32-bit registers, the Process-Controls Word 
and the Trace-Controls Word. By setting bits in 
these control words, a software debug monitor can 
closely control how the processor responds during 
program execution. 


The 80960MC has both hardware and software 
breakpoints. It provides two hardware breakpoint 
registers on-chip which can be set by a special com- 
mand to any value. When the instruction pointer 


matches the value in one of the breakpoint registers, 


the breakpoint will fire, and a breakpoint handling 
routine is called automatically. 


The 80960MC also provides software breakpoints 
through the use of two instructions, MARK and 
FMARK. These instructions can be placed at any 
point in a program and will cause the processor to 
halt execution at that point and call the breakpoint 
handling routine. The breakpoint mechanism is easy 
to use and provides a powerful debugging tool. 


Tracing is available for instructions (single-step exe- 
cution), calls and returns, and branching. Each dif- 
ferent type of trace may be enabled separately by a 
special debug instruction. In each case, the 
80960MC executes the instruction first and then 
calls a trace handling routine (usually part of a soft- 
ware. debug monitor). Further program execution is 
halted until the trace routine is completed. When the 
trace event handling routine is completed, instruc- 
tion execution resumes at the next instruction. The 
80960MC's tracing mechanisms, which are imple- 
mented completely in hardware, greatly simplify the 
task of testing and debugging software. | 


FAULT DETECTION 


The 80960MC has an automatic mechanism to - 


handle faults. There are ten fault types including 
trace, arithmetic, and floating-point faults. When the 
processor detects a fault, it automatically calls the 
appropriate fault handling routine and saves the cur- 
rent instruction pointer and necessary state informa- 
tion to make efficient recovery possible. The proces- 
sor posts diagnostic information on the type of fault 
to a Fault Record. Like interrupt handling routines, 
fault handling routines are usually written to meet 
the needs of a specific application and are often in- 
cluded as part of the operating system or kernel. 


For each of the ten fault types, there are numerous .— 


subtypes that provide specific information about a 
fault. For example, a floating-point fault may have its 
subtype set to an Overflow or Zero-Divide fault. The 
fault handler can use this specific information to re- 
spond correctly to the fault. | 
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Interagent Communications (IAC) © 


In order to coordinate their actions, processors in a 
multiple processor system need a means for com- 
municating with each other. The 80960MC does this 
through a mechanism known as Interagent Commu- 
nication messages or IACs. 


IAC messages cause a variety of actions including 
starting and stopping processors, flushing instruc- 
tion caches and TLBs, and sending interrupts to oth- 
er processors in the system. The upper 16 Mbytes of 
the processor’s physical memory space is reserved 


for sending and receiving IAC messages. 


BUILT-IN TESTABILITY 


Upon reset, the 80960MC automatically conducts an 
exhaustive internal test of its major blocks of logic. 


Then, before executing its first instruction, it does a 
zero check sum on the first eight words in memory 
to ensure that the system has been loaded correctly. 
lf a problem is discovered at any point during the 
self-test, the 80960MC will assert its FAILURE pin 
and will not begin program execution. The self-test: 
takes approximately 47,000 cycles to complete. 


System manufacturers can use the 80960MC’s self- 
test feature during incoming parts inspection. No 
special diagnostic programs need to be written, and 
the test is both thorough and fast. The self-test ca- 
pability helps ensure that defective parts will be dis- 
covered before systems are shipped, and once in 
the field, the self-test makes it easier to distinguish 
between problems caused by processor failure and 
problems resulting from other causes. 


COMPATIBILITY WITH 80960K-SERIES 


Application programs written for the 80960K-Series 
microprocessors can be run on the 80960MC with- 
out modification. The 80960K-Series instruction set 
forms the core of the 80960MC's instructions, so bi- 
nary compatibility is assured. 
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CHMOS | | ; | CMOS processes and opens a new era in micro- 

processor performance. It combines the high per- 
The 80960MC is fabricated using Intel’s CHMOS IV formance capabilities of Intel’s industry-leading 
_ (Complementary High Speed Metal Oxide Semicon- HMOS technology with the high density and low 
ductor) process. This advanced technology elimi- power characteristics of CMOS. The 80960MC is 
nates the frequency and reliability limitations of older available at 16, 20 and 25 MHz. 


_ Table 4a. 80960MC Pin Description: L-Bus isle 


| | Symbol: | Type | Name and Function 


SYSTEM CLOCK provides the fundamental timing for 80960MC systems. It is 
divided by two inside the 80960MC to generate the internal processor Clock. CLK2 
is shown in Figure 9. | | 


LOCAL ADDRESS/DATA BUS carries 32-bit shysical addresses and data to and 
from memory. During an address (Tg) cycle, bits 2-31 contain a physical word 
address (bits 0-1 indicate SIZE; see below). During a data (Tg) cycle, bits 0-31 
contain read or write data. The LAD lines are active HIGH and float to a high 
impedance state when not active. 


SIZE, which is comprised of bits O—1 of the LAD lines during a Ta cycle, specifies 
the size of a transfer in words for a burst transaction. 
LAD; | LAD 9 


0 1 Word 

1 °  2Words © 
(a 3 Words 
1 4 4 Words 


ADDRESS-LATCH ENABLE indicates the transfer of a physical address. ALE is 
asserted during a T, cycle and deasserted before the beginning of the Ty state. It 
is active LOW and floats to.a high impedance state when the processor is idle or 

is at the end of any bus access. es 


_ ADDRESS STATUS indicates an address state. ADS i is asserted every T, state 
and deasserted during the the following Tg state. For a burst transaction, ADS is 
asserted again every Ty state where READY was asserted in the previous cycle. 


0 
0 
1 


WRITE/READ $pecifies, during a Ta cycle, whether the operation is a write or 
read. It is latched on-chip and remains valid during Tg and Ty states. 


DATA TRANSMIT/RECEIVE indicates the direction of data transfer to and from 
the L-Bus. It is low during Ta, Ty and Tg cycles for a read or interrupt 

acknowledgement; it is high during Tg, Tw and Tg cycles for a write. DT/ R never 
changes state when DEN is asserted (see Timing Diagrams). | 


DATA ENABLE is asserted during Tg and Ty cycles and indicates transfer of data | 
on the LAD bus lines. 


READY indicates that data on LAD lines can be sampled or removed. If READY is 
not asserted during a Tg cycle, the Tg cycle is extended to the next cycle by — 
inserting wait states (Tw), and ADS is not asserted in the next cycle. 


BUS LOCK prevents other bus masters from gaining control-of the L-Bus | ! 
following the current cycle (if they would assert LOCK to do so). LOCK is used by 
the processor or any bus agent woen it Penons indivisible Read/ Mealy cinite 
(RMW) operations. | 


For a read that is designated as a RMW- read, LOCK is examined. if asserted, the 
processor waits until it is not asserted; if not asserted, the processor asserts 
LOCK during the T, cycle and leaves it asserted. . 


A write that is designated as an RMW-write deasserts LOCK in the T, cycle. 
I/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = three state 

Ta = Taddress: Td = Toatas TW = Twait Tr = TRecovery: Ti = Tide: Th = THold 
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Table 4a. 80960MC Pin Description: L-Bus Signals (Continued) 


BYTE ENABLE LINES specify which data bytes (up to four) on the bus take part 
in the current bus cycle. BE3 eoliesponee to LAD3;-LADo2,4 and BEo corresponds 
to LAD7-LADpo. 


The byte enables are provided in advance of data. The byte enables asserted 
during T, specify the bytes of the first data word. The byte enables asserted 
during Tg specify the bytes of the next data word (if any), that is, the word to be 
transmitted following the next assertion of READY. The byte enables during the 
Tg cycles preceding the last assertion of READY are undefined. The byte enables 
are latched on-chip and remain constant from one Tg ee to the next when — 
READY is not asserted. 


For reads, the byte enables specify the byte(s) that the processor will actually use. 
80960MC’s will assert only adjacent byte enables (e.g., asserting just BEg and 
BE> is not permitted), and are required to assert at least one byte enable. 
Accesses must also be naturally aligned (e.g., asserting BE and BE> is not 
allowed even though they are adjacent). To produce address bits Ap and A, 
externally, they can be decoded from the byte enables. 


HOLD HOLD indicates a request from a secondary bus master to acquire the bus. If the 

(HLDAR) processor is initialized as the primary bus master this input will be interpreted as 

| HOLD. When the processor receives HOLD and grants another master control of 
the bus, it floats its three-state bus lines, asserts HOLD ACKNOWLEDGE, and 
enters the T} state. When HOLD is deasserted, the processor will deassert HOLD 
ACKNOWLEDGE and go to either the Tj or Tg state. 


HOLD ACKNOWLEDGE RECEIVED indicates that the processor has acquired 
the bus. If the processor is initialized as the secondary bus master this input is 
interpreted as HLDAR. 


~ HOLD timing is shown in Figure 11. 


HLDA : HOLD ACKNOWLEDGE relinquishes control of the bus to another bus master. If 
(HOLDR) S.: the processor is initialized as the primary bus master this output will be interpreted 
as HLDA. When HOLD is deasserted, the processor will deassert HLDA and go to 
either the Tj or Tg state. 


HOLD REQUEST indicates a request to acquire the bus. If the processor is 
initialized as the secondary bus master this output will be interpreted as HOLDR. 


HOLD timing is shown in Figure 11. 
C.,AACHE O. CACHE indicates if an access is cacheable during a Te cycle. The CACHE signal 
T.S. floats to a high impedance state when the processor is idle. . 
I/O = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = three state _ 
Ta = Taddress: Td = Toatas Tw = Twait Tr = TRecovery: Ti = Tidle: Th = THold 
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Table 4b. 80960MC Pin Description: Module Support Signals 


BAD ACCESS, if asserted in the cycle following the one in which the last READY 
of a transaction is asserted, indicates that an unrecoverable error has occurred on 
the current bus transaction, or that a synchronous load/store instruction has not 
been acknowledged. 


- STARTUP: During system reset, the BADAC signal is interpreted differently. If the 
signal is high, it indicates that this processor will perform system initialization. If it 
-Is low, another processor in the system will perform system initialization instead. 


_ RESET clears the internal logic of the processor and causes it to re-initialize. 


During RESET assertion, the input pins are ignored (except for BADAC and 
[AC/INTo), the tri-state output pins are placed in a high impedance state, and 
other output pins are placed in their non-asserted state. 


RESET must be asserted for at least 41 CLK2 cycles for a predictable RESET. 
The HIGH to LOW transition of RESET should occur after the rising edge of both 
CLK2 and the external bus CLK, and before the next rising edge of CLK2. 


RESET timing is shown in Figure 10. 


FAILURE INITIALIZATION FAILURE indicates that the processor has failed to initialize 
) correctly. After RESET is deasserted and before the first bus transaction begins, 
FAILURE is asserted while the processor performs a self-test. If the self-test 
completes successfully, then FAILURE is deasserted. Next, the processor — 
performs a zero checksum on the first eight words of memory. If it fails, FAILURE 
is asserted for a second time and remains asserted; if it passes, system 
initialization continues and FAILURE remains deasserted. 


NOT CONNECTED indicates pins should not be connected. Never connect any 
pin marked N.C. 


INTERAGENT COMMUNICATION REQUEST/INTERRUPT 0 indicates either 

that there is a pending IAC message for the processor or:an interrupt. The bus 
interrupt control register determines in which way the signal should be interpreted. 
To signal an interrupt or [AC request in a synchronous system, this pin (as well as 
the other interrupt pins) must be enabled by being deasserted for at least one bus 

_ cycle and then asserted for at least one additional bus cycle; in an asynchronous 
system, the pin must remain deasserted for at least two bus cycles and then be 
asserted for at least two more bus cycles. 


LOCAL PROCESSOR NUMBER: This signal is interpreted differently during | 
system reset. If the signal is at a high voltage level, it indicates that this processor 
is a primary bus master (Local Processor Number = 0); if it is at a low voltage 
level, it indicates that this processor is a secondary bus master (Local PEODESSOF 
Number = 1). | 


INTERRUPT 1, like INTo, provides direct interrupt signaling. 


INTERRUPT 2/INTERRUPT REQUEST: The bus control registers determines 
how this pin is interpreted. If INT», it has the same interpretation as the INTg and 
INT1 pins. If INTR, it is used to receive an interrupt request from an external 
interrupt controller. 


INTERRUPT 3/INTERRUPT ACKNOWLEDGE: The bus interrupt control reulster 
determines how this pin is interpreted. If INT3, it has the same interpretation as 
the INTo, INT;, and INTo pins. If INTA, it is used as an output to control interrupt- 
‘acknowledge bus transactions. The INTA output is latched on-chip and remains 
valid during Tg cycles; as an output, it is open-drain. 


1/0 = Input/Output, O = Output, | = Input, O.D. = Open-Drain, T.S. = three state 
Ta = Taddress: Td = Tpatas Tw = Twait Tr = TRecoverys Tj = Tide Th = THold 
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ELECTRICAL SPECIFICATIONS 
Power and Grounding 


The 80960MC is implemented in CHMOS III technol- 
ogy and has modest power requirements. Its high 
clock frequency and numerous output buffers (ad- 
dress/data, control, error and arbitration signals) 
can cause power surges as multiple output buffers 
drive new signal levels simultaneously. For clean on- 
chip power distribution at high frequency, 12 Voc 
and 13 Vss pins separately feed functional units of 
the 80960MC. 


Power and ground connections must be made to all 
power and ground pins of the 80960MC. On the cir- 
cuit board, all Vcc pins must be strapped closely 
together, preferably on a power plane. Likewise, all 
Vss pins should be strapped together, preferably on 
a ground plane. 


Power Decoupling Recommendations 


Liberal decoupling capacitance should be placed 
near the 80960MC. The processor can cause tran- 
sient power surges when driving the L-Bus, particu- 
larly when it is connected to a large capacitive load. 


Low inductance capacitors and interconnects are 
recommended for best high frequency electrical per- 
formance. Inductance can be reduced by shortening 
the board traces between the processor and decou- 
pling capacitors as much as possible. 


Connection Recommendations 


For reliable operation, always connect unused in- 
puts to an appropriate signal level. In particular, if 


80960MC 
OPEN=DRAIN 
OUTPUT 
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Low Drive Network: 
© Von = 3.42V 
® lo. = 25.3 mA 


80960MC 


ADVANCE INFORMATION 


one or more interrupt lines are not used, they should 
be pulled up or down to their respective deasserted 
states. No inputs should ever be left floating. 


All open-drain outputs require a pullup device. While 
in some cases a simple pullup resistor will be ade- 
quate, we recommend a network of pullup and pull- 
down resistors biased to a valid Viy (2 3.4V) and 
terminated in the characteristic impedance of the cir- 
cuit board. Figure 6 shows our recommendations for 
the resistor values for both a low and high current 
drive network, which assumes that the circuit board 
has a characteristic impedance of 10029. The advan-: 
tage of terminating the output signals in this fashion 
is that it limits signal swing and reduces AC power 
consumption. | 


Characteristic Curves 


Figure 7 shows the typical supply current require- jiaem 


ments over the operating temperature range of the 
processor at supply voltage (Vcc) of 5V. Figure 8 
shows the typical power supply current (Icc) re- 
quired by the 80960MC at various operating fre- 
quencies when measured at three input voltage 
(Vcc) levels. 


Figure 9 shows the typical capacitive derating curve 
for the 80960MC measured from 1.5V on the system 
clock (CLK) to 0.8V on the falling edge and 2.0V on 
the rising edge of the L-Bus address/data (LAD) sig- 
nals. | 


80960MC 
OPEN=DRAIN 
. OUTPUT 
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High Drive Network: 
© VoH = 3.41V 
* lo. = 33.8 mA 


Figure 6. Connection Recommendations for Low and High Current Drive Networks 
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- 80960MC’s tristate pins, and Figure 11 shows the 
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POWER SUPPLY CURRENT (mA) 


-60 -40 -20 0 20 40 60 80 100 120 140 © 
. CASE TEMPERATURE (°C) 
(DATA POINTS TAKEN @ ~60, -5, 25, 95, 130°C) 
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OPERATING FREQUENCY (MHz) 


m@4.5V 0@5.0V ¢@5.5V 


271080-30° 


Figure 7. Typical Supply Current (I cc) Figure 8. Typical Current vs Frequency | 


Test Load Circuit 


Figure 10 illustrates the load circuit used to test the 


load circuit used to test the open drain outputs. The 
open drain test uses an active load circuit in the form 
of a matched diode bridge. Since the open-drain out- 
puts sink current, only the lo, legs of the bridge are 
necessary and the lox legs are not used. When the 
80960MC driver under test is turned off, the output 
pin is pulled up to Vref (i.e., Von). Diode Dy is 
turned off and the Io, current source flows through 
diode Do. _ ee | 


_ Tristate Output Valid Delay (ns) 


When the 80960MC open-drain driver under testis =| Capacitive Load (pF) 
on, diode D; is also on, and the voltage on the pin | 
being tested drops to Vo,. Diode Do turns off and | 
lov flows through diode Dj. | Figure 9. Capacitive Derating Curve 
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80960MC 
TRISTATE OUTPUT 


80960MC 
OPEN=DRAIN OUTPUT 


“ey 
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Figure 10. Test Load Circuit for 
TRI-STATE Output Pins 
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Figure 11. Test Load Circuit for Open-Drain Output Pins 
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ABSOLUTE MAXIMUM RATINGS* NOTICE: This data sheet contains information on 
7 | , ps products in the sampling and initial production phases 
Case Temperature . | of development. The specifications are subject to 
under Bias(7)............ aan 98 6 to 125°C change without notice. Verify with your local Intel 
Storage Temperature .......... —65°C to +150°C Sales office that you have the latest data sheet be- 
| fore finalizing a design. 
Voltage on Any Pin.......... —0.5V to Voc + 0.5V 
BON Ae ee *WARNING: Stressing the device beyond the “Absolute 
Power Dissipation ................. 2.6W (25 MHz) 


Maximum Ratings” may cause permanent damage. 
These are stress ratings only. Operation beyond the 
“Operating Conditions” is not recommended and ex- 
tended exposure beyond the “Operating Conditions” 
may affect device reliability. 


D.C. CHARACTERISTICS 
80960MC: Tcasel®) = —55°C to + 125°C, Voc = 5V + 5% 


Input Low Voltage 


Input Leakage Current 


Output Leakage Current 
Input Capacitance 


Thermal Resistance 
(Junction-to-Ambient) 

Pin Grid Array 

Ceramic Quad Flatpack 


Thermal Resistance 
(Junction-to-Case) — 

Pin Grid Array 

Ceramic Quad Flatpack | 


~ NOTES: 


1. For three-state outputs, this parameter is measured at: | . - 
Address/Data..............0000. eee ne ere er re ne Dende ante Da aise Me eer ene Pee eee  4.0MA 
COnttolsin cca sk gegen Shia duce Siw pe haecemtany Mein awe ee ee it kalieadieeaes igeeea Rueaedte eet . 9.0 MA 

2. This parameter is measured at: | : Boe ot 
Addréss/Datavcs02c 25h sacau nye Uh aetdacs Tit Giese Pipletvesa hicks A fsre ha asenne ieee NaS kip ian es .-1.0 mA 
Controls ....... eaang Saneere Beauties fatten gee sinc’ nihetid Gabeiniay Sera A ast Buds asad Me ae eleete DU ans Renee nan —0.9 mA 
POU aa acigea Nr ee a peered Sh dap dh lca ares cede a dears ee Wud Nas Ag Gey rah rash 's dw Aaa er race nates Mleser a tenainc d Seaced ght —5.0 mA 

3. Input, output, and clock capacitance are not tested. 

4. Not measured on open-drain outputs. . C3 

5: FOr Open-drain OUIDINS svar caveeueu ates ga dee een eea a cue eser MiG RIS esd asa tha Gate tae saleae ee ake oe tie peo ea 25 mA 

6 


. Case temperatures are “instant on”. 
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AC SPECIFICATIONS 


This section describes the AC specifications for the 
80960MC pins. All input and output timings are 
specified relative to the 1.5V level of the rising edge 
of CLK2, and refer to the time at which the signal 


EDGE - 
CLK2 


OUTPUTS: 
ADS, 
W/R,DEN, 
BE3~BEy, 
HLDA/HOLDR, 
CACHE 
LOCK, INTA 


DVR 


INPUTS: 
LAD 1 =LADp, 
BADAC, 


IAC /INT, INT;, 


INT) /INTR,INTs 


HOLD,HLDAR, 
LOCK, 
READY 


NOTE 1: | 
For Tri-State pins, Tg and Tg are measured at 1.5V. 
For Open-Drain pins, Tg is measured at 1.5V, Tg at 0.8V. 


80960MC 


IWOOT 


ADVANCE INFORMATION 


reaches (for output delay and input setup) or leaves 
(for hold time) the TTL levels of LOW (0.8V) or HIGH 
(2.0V). All AC testing should be done with input volt- 
ages of 0.4V and 2.4V, except for the clock (CLK2), 
which should be. tested with input voltages of 0.45V 
and 0.55 Vcc. -s- | 


VALID OUTPUT 


To | Ua. 


2.0V . 2.0V 
0.8V \0.8V 


putz) VALID INPUT 
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Figure 12. Drive Levels and Timing Relationships for 80960MC Signals 
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Figure 13. Timing Relationship of L-Bus Signals 
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A.C. Specification Tables 
80960MC A.C. Characteristics (16 MHz) 
Tcase) = —55°C to + 125°C, Voc = 5V +5% 


eee [win | Max 
ee Clock 31.25 125 
Period (CLK2) 
Processor Clock 
Low Time (CLK2) 
Processor Clock 
High Time (CLK2) 
Processor Clock 
- Fall Time (CLK2) 
Processor Clock 
Rise Time (CLK2) 
Output Valid 
Delay 


Test Conditions 


Vin = 1.5V 


Vit = 10% Point — 

= 1.2V 
Vi = 90% Point 

= 0.1V + 0.5Vcc 
Vin = 90% Point to 10% 
| Point 
Vin = 10% Point to 90% 

Point 

CL = 100 pF (LAD) 
CL = 75 pF (Controls) 


HOLDA Output 
Valid Delay 

ALE Width 

ALE Invalid Delay 


Output Float 
Delay 

HOLDA Output 
Float Delay 
Input Setup 1 
Input Hold. 


HOLD Input Hold 
Input Setup 2 


Setup to ALE 
Inactive 

Hold after ALE 
Inactive 


C. = 100 pF (LAD) . 
C. = 75 pF (Controls)(2) 


Ci = 100 pF (LAD) 
CL = 75 pF (Controls) 


CL = 100 pF (LAD) 
C. = 75 pF (Controls) 


Reset Hold 
Reset Setup 
Reset Width 
_ NOTES: 


1. IAC/INTo, INT4, INTo/INTR, INT3 can be Banaronsie 

2. A float condition occurs when the maximum output current becomes less than I,o. Float delay is not tested, but should be 
no longer than the valid delay. 

3. Case temperatures are “instant on”. 


a 
a 
os 
—— 
ns 
ae 
™ 
[re 
nef 
ae 


1281 
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A.C. Specification Tables (Continued) 
80960MC A.C. Characteristics (20 MHz) 
Toasel) = —55°C to +125°C, Voc = 5V +5% 


Ty 7 Processor Clock 25 
7 Period (CLK2) 
To Processor Clock 
Low Time (CLK2) 
T3 Processor Clock 
High Time (CLK2) 


= 


~ NER iy) we) = = 
oO}; oO Oo) =) 7) io) 


a 
125 


< 


Vin = 1.5V 
Vit = 10% Point 
= 1.2V 


Test Conditions . - 


ns Vin = 90% Point 


= 0.1V+ 0.5 Vcc 
Processor Clock Vin = 90% Point to 10% 
Fall Time (CLK2) Point 
Processor Clock Vin = 10% Point to 90% 
Rise Time (CLK2) Point 


Output Valid 2 
Delay 


+ 
T 
T ns C, = 60 pF (LAD) 
C. = 50 pF (Controls) 
T HOLDA Output ; 4 
Valid Delay 
T 
T 
T 
T 


C. = 50 pF 


C. = 50 pF 


L 
C, = 50 pF(2) 2 


C, = 60 pF (LAD) | 
Cy = 50 pF (Controls)(2) 
( 


C. = 50 pF 


ALE Width 12 
~ ALE Invalid Delay 


Output Float 2 
Delay. 


a) 
o) 


Float Delay , 


(Note 1) | 


Ci = 60 pF (LAD) 
C, = 50 pF (Controls). 


CL = 60 pF (LAD) 
C, = 50 pF (Controls) — 


41 CLK2 Periods Minimum 


Setup to ALE 10 
Inactive 


Hold after ALE 

| Inactive — 
| Tas. | Reset Hold 3 
le | Reset Setup 5 


1025 


NOTES: 

1. IAC/INTo, INT;, INT2/INTR, INT3 can be asynchronous. 
2. A float condition occurs when the maximum output current becomes less than ILo. Float delay is not tested, but should be 
no longer than the valid delay. ete % . - 3 
3. Case temperatures are “instant on”. 


3-257 


intel. te - B0960MC ADVANCE INFORMATION 


A.C. Specification Tables (Continued) 
80960MC A.C. Characteristics (25 MHz) 
Tcase®) = —55°C to + 125°C, Veg = 5V +5% | : | 


Test Conditions 


Vit = 10% Point 
= 1.2V ; 
Vi = 90% Point a 
| = 0.1V + 0.5Vcc 
10 | Vin = 90% Point to 
10% Point 


10 Vin = 10% Point to 
% Point ; 


Cy, = 60 pF (LAD) 
C,. = 50 pF (Controls) 


I 
he ie = ne 4 a -C, = 50 pF (Controls)(2) 
~ Float Delay - fats (eee ee i | 
| Tio | inputsewp1 | 38 | -_ 
| Ti | inputHoig | | 


- Setup to ALE 
Inactive 
. Hold after ALE 
~ Inactive 


: 50pF 


CL = 60 pF (LAD) 
~C. = 50 pF (Controls) 

CL =-60 pF (LAD) 

C_ = 50 pF (Controls) 

41 CLK2 Periods Minimum 


NOTES: oe ns od a | a ont 

1. IAC/INTo, INT;, INTo/INTR, INT3 can'be’asynchronous. = = . so 

2. A float condition occurs when the maximum output current becomes less than ILo. Float delay is not tested, but should be 
no longer than the valid delay. | Os : | , 
3. Case temperatures are “‘instant on”. 


- Reset Hold 3 
- Reset Setup - a 
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HIGH LEVEL (MIN) 0.55V¢¢ 


LOW LEVEL (MAX) 0.8V 


271080-6 


OUTPUTS 


INIT PARAMETERS (BADAC, | 


IAC) MUST BE SETUP 8 CLOCKS 
Ty5 = RESET HOLD 
PRIOR TO THIS CLK2 EDGE | , 113 = RESET SETUP 
INIT PARAMETERS MUST-BE HELD Vee 


BEYOND THIS CLK2 EDGE 
271080-7' 


Figure 15. RESET Signal Timing 
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| DELAY OF S5ns MINIMUM ~ 
IS REQUIRED 
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"Figure 16. Hold Timing 


Design Considerations 


Input hold times can be disregarded by the designer 
whenever the input is removed because a subse-: — 
quent output from the processor is deasserted (e. g., 


DEN becomes deasserted). 


In other words, whenever the. processor generates 
an output that indicates a transition into a subse- 
quent state, the processor must have sampled any 
inputs for the previous state. 


Similarly, whenever the processor generates an out- — 


put that indicates a transition into a subsequent 
‘state, any outputs that are specified to be three stat- 
ed in this new state are guaranteed to be three stat- 
ed. 


Designing for the ICE-960MC 
The 80960MC In-Circuit Emulator assists in debug- 


ging 80960MC hardware and software designs. The . 


product consists of a probe module, cable, and con- 
trol unit. Because of the high operating frequency of 
80960MC systems, the probe module connects di- 
rectly to the 80960MC socket. 


When designing an 80960MC hardware system that 


- uses:the ICE-960MC to debug the system, several 


electrical and mechanical characteristics should be 
considered. These considerations include capacitive 


loading, drive requirement, power requirement, and 
physical layout. — 


- The ICE-960MC: probe module increases the load 


capacitance of each line by up to 25 pF. It also adds 


, one standard Schottky TTL load on the CLK2 line, 


up to one advanced low-power Schottky TTL load 
for each control signal line, and one advanced low- 
power Schottky TTL load for each address/data and 
byte enable line. These loads originate from the 
probe module and are aiveD by the 80960MC proc- 
essor. 


ae To achieve high noise immunity, the ICE-960MC 


probe is powered by the user’s system. The high- 
speed probe circuitry draws up to 1.1A plus the max- 


imum current (Icc) of the 80960MC processor. 


The mechanical considerations are shown in Figure 
17, which illustrates the lateral clearance require- 
ments for the ICE-960MC probe as viewed from 
above the socket of the 80960MC processor. 
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USER CPU 
SOCKET 


UNDER 


EMULATION 


PROCESSOR 


VERTICAL 
CLEARANCE 1.2" 


VIEW FROM 
ABOVE USER CPU 
SOCKET 


EMULATION 
PROCESSOR 


ICE PROCESSOR MODULE 


RIBBON CABLE CONNECTOR 


CABLE TO ICE CONTROL UNIT 


MINIMUM CABLE 
BEND RADIUS: 
LESS THAN 3.0" 
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- Figure 17. ICE-960MC Lateral Clearance Requirements 


MECHANICAL DATA 


Pin Assignment 


Package Dimensions and Mounting | 


Pins in the pin grid array package are arranged 


The 80960MC is packaged in a 132-lead ceramic pin 


grid array and a 164-lead ceramic quad flatpack. The 
80960MC pin grid array pinout as viewed from the 
substrate side of the component is shown in Figure 
18 and from the pin side in Figure 19. The 80960MC 
ceramic quad flatpack pinout as viewed from the top 
of the package is shown in Figure 20. 


Vcc and GND connections must be made to multi- 
ple Voc and GND pins. Each Vcc and GND pin must 
be connected to the appropriate voltage or ground 
and externally strapped close to the package. Pref- 
erably, the circuit board should include power and 
ground planes for power distribution. Tables 5, 6, 7 
and 8 list the function of each pin. 


NOTE: 
Pins identified as N.C., ‘No Connect,” should never 
be connected under any circumstances. 
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0.100 inch (2.54mm) center-to-center, in a 14 by 14 
matrix, three rows around. (See Figure 21.) 


A wide variety of available sockets allow low-inser- 


tion or zero-insertion force mountings, and a choice | 


of terminals such as soldertail, surface mount, or 
wire wrap. Several applicable sockets are shown in 
Figure 22. : 


Package Thermal Specification 


The 80960MC is specified for operation when its 
case temperature is within the range of —55°C to 
+125°C. The PGA case temperature should be 
measured at the center of. the top surface opposite 
the pins as shown in Figure 23. The ceramic quad 
flatpack case temperature should be measured at 
the center of the lid on the top surface of the pack- 
age. 


WAVEFORMS 


Figures 24 through 30 show the waveforms for vari- 
ous transactions on the 80960MC’s local bus. 


11 
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Figure 18. MG80960MC Pinout—View from Top (Pins Facing Down) 
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Figure 19. MG80960MC Pinout—View from Bottom (Pins Facing Up) 
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160 LOCK 


MQ80960MC 
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(Staggered pin arrangement is shown for clarity only. Actual package has pins of equal length.) 


Figure 20. MQ80960MC Pinout—View from Top of Package 
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Table 5. MG80960MC (PGA) Pinout—In Pin Order 


[vee [oe |e 

NG 
en 5 
THoumncoan | ce | 
re 
oe 


ee 
[vss] ata ON 
NOTE: 


Pins identified as N.C. (“No Connect”) should never be connected under any circumstances. 


Z2|z Z2|z|zZ|Z2|zZz|/Z2/2|2 
O}O aAloOl/alololalolo 


Fle 
Z|ol|Z2 
0| BO 
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Table 6. MG80960MC (PGA) Pinout—in Signal Order 


QO 
©) 


G) 


RESET — B13 


<|< DiZ{/Z/Zz/2\2 
BO;aA;aA;O]O 
O 
<ijoof | 
> 


QO} > 
Ne es 


—_h 


Q 
QO) 


INT3/INTA * 5642 


QO 
10) 
oo 


et) 
< 

n” 

n 


i¢%) 
£ 
” 
w” 


nn 


= 
Go 


Signal 
Nc 


< 
” 
7) 
=|/= 
ro) 


fe 

—” 

~ 
wt | 8 
G 


< 
ep 
wm 


2|Zz\|2|2 
O}O;o}o 


ath, 


NOTE: . 
Pins identified as N.C. (“No Connect’’) should never be connected under any circumstances. 
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Table 7. MQ80960MC (CQP) Pinout—In Pin Order 

| Pin | Signal | Pin | Signal | Pin | Signal | Pin | Signal | 
BE 
READY 
«| Be; | 45 | Ado | 66 | NO | v7 [| NC 
| 6 | oR | 47 | tADs | e8 | NC | 129 | NC 
[7 | tads, | 48 | LADs | 69 | NG | 190 | NG 
[es | wr | 49 | tads | 9 | NO | 131 | NO 
9 | tADs | 60 | tad, | 91 | NO | 12 | NO 
P14 | tADes | 85 | LAD, | 96 | No | 137 | NG 
ei NG ee IN Oo NC a es, 
Vss Voc ra 
[19 | Voc | 60 | Ves | tor | No | 142 | NG | 
Vss Voc | 102 | NG | m3 | No | 
| 23 | Vss Vss_|_—105 146 
Ves | 66 | Vss 107 
Veo Veo rag | NG. 
ar_| Hob | 68 | No | 19 | No | 160 | NG 
p28 | SADAG | 69 | NG | 10 | NO | 1 | NO 
[20 | tad, | 70 | NG | m1 | NO | 12 | NG | 
Voc 

NC. 
NG. 
[36 | LAD | 76 | NG. | ‘7 | NO 
LOOK 

[os | 1Adis [79 | No. | 120 | voc | 161 | FAIL 

ew TS ae eS Va DEN 

[>a [tad | 81 | NC | 122 | NG | 163 | BED 
164 [Ves 


NOTE: . 
Pins identified as N.C. (‘No Connect’) should never be connected under any circumstances. 
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Table 8. MQ80960MC (CQP) Pinout—In Signal Order 


| Signal | Pin | Signal | Pin 
z Nc. | 48 
ALE [7a | tADeg | 20 NG} 149 _| 
SADAC LAD 2s CNG. | 150 
BE, i 
“BE, ee 
Be; +| 2 | 1AD | 9 | NG | 109 | NG | 156 
wo | NG | 2 | READY | 3 
Pore} 6 | NG | 7 | NG | 119 | reser | 67 _| 
FAURE | _i6t_| Nc |e | No | 14 | Voc 
[ Hupa/HoLbR [ie | Nc | 69 | NG | 115 | Vcc | 21 
[cinty [7 | nc | 7m | NG | 17 | Voc 
es 
INTa/INTA 7 | No. | 122 | Voc 
NC Voc 
[ta, i ot NG dt NG. | aa Voc «Ysa? 
Pane iY | NG. | ve | NG | tar | Voc | o4 
[tans | s¢ | NG | 79 | NG | 128 100 
[-taps +) ef NG |e | NG | 120 | Voc. | 126 | 
Vis 
Vis 
apy Sid aT NG | ONG 196 ~*«|:~CNes ~—=C«|=C 
tad | 43 | NG | 90 | NG | 17 | Veg | 62 
[tad | tf NG | 9 | NG | 198 | Vos 
[poy id ao NG oe | NG | 189«|~Ces iC 
140 | Vs 
tabs | _a7_| NG | 96 | No | 143 | Ves | 125 
145 | _Vss 
[tad +f _a2_ | NG | 09 | NG | 146 | Ves | 164 | 
47 | WR | 8 
NOTE: 


Pins identified as N.C. (“No Connect’) should never be connected under any circumstances. | 
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Figure 21. A 132-Lead Pin-Grid Array (PGA) Used to Package the MG80960MC 
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e Low insertion force (LIF) soldertail 
55274-1 

e Amp tests indicate 50% reduction in . 
insertion force compared to . AMP LIF SOCKET 
machined sockets 55274-1 

Other socket options ; 

e Zero insertion force (ZIF) soldertail 
55583-1 

e Zero insertion force (ZIF) Burn-in 
version 55573-2 

Amp incorporated 
(Harrisburg, PA 17105 U.S.A 
Phone 717-564-0100) 


AMP ZIF SOCKET 
55583-1 


271080-13 


Cam handle locks in low profile position when MG80960MC is installed 
(handle UP for open and DOWN for closed positions). | 


Courtesy Amp Incorporated: 


Peel-A-Way* Mylar and Kapton Peel-A-Way Carrier No. 132: 
Socket Terminal Carriers Kapton Carrier is KS132 


¢ Low insertion force surface Mylee Callens Morse 
mount CS132-37TG Molded Plastic Body KS132 

@ Low insertion force soldertail is shown below: 
-CS132-01TG 


¢ Low insertion force wire-wrap 
CS132-02TG (two-level) 


CS132-03TG (thee-level) 


© Low insertion force press-fit = 


pelsenete | RRR | 
Advanced Interconnections 333 

(5 Division Street) > 

Phone 401-885-0485) = 


14x14x3ROWS 


271080-14 


271080-15 
Courtesy Advanced Interconnections 
(Peel-A-Way Terminal Carriers 
U.S. Patent No. 4442938) 


*Peel-A-Way is a trademark of Advanced Interconnections. 


Figure 22. Several Socket Options for Mounting the MG80960MC 
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MEASURE CASE TEMPERATURE 
AT CENTER OF TOP SURFACE 
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Figure 24. System and Processor 
Clock Relationship 


- 132-PIN PGA 
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Figure 23. Measuring MG80960MC PGA 
Case Temperature (Tc) 
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Figure 25. Read Transaction 
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Figure 26. Write Transaction with One Wait State 
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Figure 27. Burst Read Transaction 
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Figure 28. Burst Write Transaction with One Wait State 
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Figure 29. Interrupt Acknowledge Transaction 
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Figure 30. Bus Exchange Transaction (PBM = Primary Bus Master, SBM = Secondary Bus Master) 
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Revision History 


1. 20 MHz timing specifications were added. 
2. Pin 158, ceramic quad pack, (see Figure 20) changed from NC (No Connect) to Vss. 
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M82965 | 
FAULT TOLERANT BUS EXTENSION UNIT 
Military 
Multiprocessor Support @ Message Passing 


— Connect up to 32 Processor and 
Memory Modules in a Single System 


— Supports Interagent Sonmnunieation 
— Redundant Error Reporting Network 


Multiple Bus Support with No External Two I/O Prefetch Channels 

Logic — Provides High-Bandwidth, Low 

— Connect up to Four 32-Bit Buses for Latency Access to Memory or I/O 
High-Bandwidth Access to _ for Sequential Transfers | 
Interleaved Memory m Memory Module Support 


Software-Transparent Fault Tolerance 
— Recover from a Single-Point Failure. 


— Interfaces Discrete Memory 
Controller and DRAM Array to AP- 


| Bus — 
Advanced CHMOS Ill Technology 


mw Advanced Package Technology 
— 132 Lead Ceramic Pin Grid Array _ 
—— 164 Lead Ceramic Quad Flatpack 


Military Temperature Range: 
— 55°C to + 125°C (Tc) 


in a Module or Bus without Affecting 
Program Execution 


t Cache Control Support 
— Provides Directory, Coherency 
Logic, and Control Signals for a 
Two-Way Set-Associative Cache 
— Single BXU Supports 16 Kbytes — & 
— Combine up to Four BXUs to 
Support 64 Kbytes 


The M82965 Bus Extension Unit (BXU) is the key to building multiprocessor and fault-tolerant systems with the 
80960MC 32-bit microprocessor. BXUs connect to each other in an expandable matrix that can support up to 
32 processor and memory modules in a single, high-performance system. No external interface logic is re- 
quired. The BXU increases overall system performance by providing hardware support for local caches, |/O 
prefetch, message passing, and multiprocessor arbitration. Through redundant modules, fault-tolerant systems 
based on the BXU can sustain a single-point failure and then reconfigure themselves automatically, while 
application programs continue undisrupted. Truly a VLSI building block, the M82965 BXU supports a wide 
range of fault tolerance and performance options to meet a diverse set of cost, performance, and reliability 
needs. 
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Figure 1. M82965 Block Diagram 
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FUNCTIONAL OVERVIEW 


The M82965 Bus Extension Unit (BXU) is the key 
component in building multiprocessor and fault-toler- 
ant system designs with the 80960MC 32-bit micro- 
processor. Its primary function is to connect the Lo- 
cal bus (L-Bus) of a system module to a system-wide 
bus called the Advanced Processor Bus (AP-Bus), 
allowing the system to expand incrementally as 
each new module or AP-Bus is added. 


Several important features are provided within the 
BXU which streamline 80960MC multiprocessor sys- 
tem operation. To increase the available system bus 
bandwidth, multiple BXUs can be employed within 
each system module to support up to four AP-Buses. 
To reduce AP-Bus traffic, BXU components can di- 
rectly support a two-way set-associative cache. I/O 
prefetch channels are incorporated within each BXU 
to reduce the time necessary to transfer large blocks 
of data from shared system memory or I/O. BXUs 


support processor-to-processor communication by — 


recognizing, storing, and exchanging Interagent 
Communication (IAC) messages with other BXUs 
along the AP-Bus. Requests for access to the AP- 
Bus are resolved through BXU arbitration logic 
which ensures that no system modules will suffer 
from resource starvation. 


BXUs support fault tolerant system operation 
through several mechanisms used to detect, isolate 
and recover from hardware errors. Paired BXUs 
monitor each other’s operation on a cycle-by-cycle 


ACTIVE MODULE 


80960MC 
CPU 


e be 
8 CACHE/PRIVATE 
. MEMORY - 


PASSIVE MODULE 
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basis through a method called Functional Redun- 
dancy Checking (FRC). Errors on the AP-Bus are 
detected through interlaced parity bits on the ad- 
dress/data and control lines, signal duplication on 
the transaction control lines, and a bus timer used to 
monitor the bus for non-response to a request. Re- 
covery mechanisms include the capability to marry 
FRC modules in a primary-shadow pair (Quad Modu- 
lar Redundancy), so that if either fails, the surviving 
spouse can take over operations immediately. Tran- 
sient errors on the AP-Bus are automatically retried, 
and in the case of permanent errors, the failed bus is 
disabled and all memory accesses switched to a 
backup bus. 


MULTIPROCESSOR SUPPORT 


A multiprocessor 80960MC system is composed of 
a set of modules connected to an AP-Bus. Figure 2 
shows the three possible types of modules: active, 
passive, and the combination of both an active and 
passive module. Active modules contain up to two 
80960MC processors, cache or private memory, and 
a BXU. Passive modules contain a memory array 
and controller and a BXU. Active/Passive modules 
contain either processors and global memory, or 
master and slave I/O devices. 


ACTIVE/PASSIVE MODULE 


MASTER 1/0 


MEMORY AND 
CONTROLLER 


1 BXU 


AP=BUS, 
271082-2 


Figure 2. Types of Nodules 
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Local Bus 


In a multiprocessor system each module has its own 
Local Bus (L-Bus), which is typically confined to a 
single board. The L-Bus is provided to interconnect 
components within a module. It is a 32-bit multi- 
plexed, synchronous bus with a maximum bandwidth 
of 43 Mbytes per second at 16 MHz. It has been 
designed to interface with standard support compo- 
nents using minimal glue logic. The L-Bus uses 
HOLD/HOLDA for arbitration with bus slaves and 
LOCK for signaling indivisible operations. A READY 
signal can be used to lengthen bus transactions. 


M8&2965 
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Local Bus protocol permits both primary and sec- 
ondary bus masters to coexist on the bus (often a 
processor and a DMA, or occasionally two proces- 
sors). A secondary bus master must obtain use of 
the L-Bus. from the bus master through the use of 
HOLDR/HOLDAR. A BXU is always used as a mas- 
ter in a memory module and is generally used as a. 
slave in a processor module. Fifty BXU pins are ded- 
icated to L-Bus and module support operations (in- 
cluding cache control). The L-Bus control registers 
are shown in Table 1. 


Table 1. L-Bus Control Registers 


This register contains a unique identifier for a specific BXU on the L-Bus. It 
corresponds to the AP-Bus Physical-ID register. 


Physical-ID (Local) 


Logical-ID (Local) 
Logical-ID register. ‘ 


LBI Control | 


_ This register holds the Logical-ID of the BXU. It corresponds to the AP-Bus 


This is the major control register for BXU functions on the L-Bus. It is used to 


set the interleaving factor for the cache, determines if the BXU should act as 


processor mode. 


System Bus ID 


Match 0 


The contents of this register determine which bits in the L-Bus address should 
be recognized by the BXU. This register provides a base address for a 
partition of memory recognized by the BXU. 


a master on the L-Bus, and indicates whether the BXU is in memory or 


This register uniquely identifies the BXU as attached to one of four AP-Buses. 


Local-Bus Test This register allows system diagnostics to check on the type of recognition 
that was done on the previous L-Bus request. 


The contents of this register determine if certain bits in the Match 0 register 


should be ignored (i.e., marked ‘“‘don’t care’’) during address recognition. 


Private memory mask register. 
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Advanced Processor Bus 


A highly optimized multiprocessing bus called the 


Advanced Processor Bus (AP-Bus) interconnects 
80960MC system modules. The AP-Bus is synchro- 
nous, in that all components in the system, including 
processors and BXUs, are driven by the same clock 
edge. It is a 32-bit multiplexed bus with a maximum 
bandwidth of 43 Mbytes per second at 16 MHz. 


Transactions over the AP-Bus are encoded into 
pairs of request and reply packets. A request packet 
defines the operation, amount of data, and the loca- 
tion (or address) where the transaction will occur. In 
the case of a write request, the packet will also in- 
clude data. The reply packet indicates whether or 
not the action completed successfully, and in the 
case of read replies, will also include the requested 
data. Table 2 lists the various types of AP-Bus oper- 
ations. 


The AP-Bus supports a pipelining feature that allows 
up to three requests to be pending at any time. Re- 
ply packets are returned in the order requested un- 
less deferred, but requests and replies may be inter- 
mixed. For example, two requests may be made, fol- 
lowed by a single reply packet, then another request 
packet, before being completed by two reply pack- 
ets. 


The AP-Bus consists of 47 bi-directional signals, a 
clock signal, a RESET signal, and five module sup- 
port signals which are used to interface system mod- 
ules to the AP-Bus (see Figure 3). The BXU is the 
only component that attaches to the AP-Bus. 
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BXUs connect to each other in the form of a matrix 
to allow orderly growth in the system by the addition 
of buses or modules. An 80960MC multiprocessing 
system allows up to 32 modules and four AP-Buses. 
In practice, the number of modules in a system will 
be somewhat less in order to meet the AP-Bus’s 
timing and electrical specifications; a practical limit 
may be 20 to 25 connections to an AP-Bus. Table 3 
contains a summary of the functions of the AP-Bus 
Interface Registers. 


Table 2. Types of AP-Bus Operations 


Packet Base 
Type Action 


Request Write 


Specific 
ae 


Write | Write Word(s) 
RMW Write Word(s) 


Read er 


RMW Read | RMW Read Word(s) _ 


Read 


Reply Accepted | Read Reply Word(s) | |) Ae iy 
ae | iS Ses 


‘(Write insta 
Refused 


Not Seteve —__— 
>) 


| BadAccess Access 
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ADVANCED PROCESSOR BUS 


(— ES 
Yj Ys YH » 38 
: Cy > 


~ TRANSACTION CONTROL (5 LINES) 


PACKET SIGNALS (38 LINES) 
| ERROR SIGNAL GROUP (4 LINES) 
SYNCHRONIZATION (2: LINES) 


| | - MODULE SUPPORT (7 LINES) . 


271082-3 


Transaction Control i” ae “ Synchronization and Initialization Group | 
¢ Arbitration: ARB (3.0) ee e System Clock: CLK2 
e Reply Ordering: RPYDEF | _ © Initialization: RESET 


Packet Signals | | | Module Support Group 
© Specification: SPEC (5..0) @ Identification: INITID 
¢ Address/Data:.AD (31..0) e Module Check: MODCHK 
¢ Bus Output Control: BOUT 
Error Signal Group © Communication: COM 
¢ Check Signal: CHK (1..0) ° Voltage Reference: VarF 
e Bus Error: a: -0) : © Pop Queue: POPQUE 
e Subsystem Busy: SSBUSY 


Figure 3. Advanced Processor Bus 
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Table 3. AP-Bus Interface Registers 


Physical ID 
Logical ID 


BXUs) on an AP-Bus. 


Arbitration ID 


Com 


| 
FRC Splitting 
Control 
FRC Register 
Test Detection 


AP Match 


AP Mask | 
Memory addressing over the AP-Bus is divided into 
16-byte blocks. The location of a bus transaction is 
defined by a 32-bit address. Each address points to 


a single byte that is part of a larger 16-byte block. All 
transactions are performed on a single block or por- 


features. 


recognized by this BXU. 


tion of a block, and do not overlap multiple blocks. | 


Modes of Operation 


The BXU operates in either Processor or Memory 
mode. Processor mode provides support for Active 
or Active/Passive modules, while Memory mode 
supports Passive modules. The functions of several 
BXU signals are dependent on the operand mode 
of the Eee 


This register contains a unique identifier for a specific BXU (or FRC pair of 


This register holds the logical ID for the BXU. In every case, all BXUs in the 
same module will share the same logical 1D. When two modules are married 
in a QMR configuration, they will also share the same logical ID. 


Component The contents of this read-only register are fixed at manufacture and specify 
Specifier | the type and stepping of the component. 


When the BXU needs to issue a request on the AP-Bus, it must actively 
arbitrate for the bus. The time and order in which a BXU arbitrates is 
determined by the contents of this write-only register. 


This register is used for loading external information, such as the type of 
board the BXU resides on, into the BXU. The register is useful for both 
initialization and diagnostics. 


AP-Bus Control This register I Is the general control and status register for the BXU’s AP-Bus 
interface. 


Most of the BXU fault-tolerant capabilities can be Sey enabled by 
altering control bits in this register. et 


The value in this register determines the length of time that BXUs will remain 
quiescent following the beginning of an error report. 


Writing to this register allows a master/checker pair of BXUs to be split into 
separately functioning components. 


_ The contents of this register determine of a BXU is part of a master/checker 
pair and how the component responds if it is part of a QMR module. | 


Bits in this register enable parity logic and other internal self testing diagnostic . 


Bits in this register are compared against the corresponding bits in the AP- 
Bus address cycle and determine which partition of the address space is 


lf a bit in this register is cleared, it will cause the corresponding bit position in 
the Address Match register to be ignored during comparisons. 


In Processor mode, the BXU supports cache, I/O 
prefetch and IAC message functions. The BXU can 
act as either a master or slave on the L-Bus and 
requests can flow. in either direction between the 
AP-Bus and the L-Bus. The assumption is, however, 
that most traffic will flow from the L-Bus out onto the 
AP-Bus. In a processor-only module, there is no 
need for the BXU to participate in arbitration for the 
L-Bus,. since it will operate only as a slave. 


In Memory mode, the BXU always operates as a 
master on the L-Bus and no requests are ever ac- 
cepted from the L-Bus. All requests flow from the 
AP-Bus into the module. In this mode, the BXU sup- 
ports memory functions and signaling, but does not 
provide caching or I/O prefetch. 
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Read-Modify-Write Transactions 


Read-Modify-Write (RMW) operations are provided 
to give BXUs the ability to read and modify a location 
as a single indivisible action. A RMW-Read opera- 
tion initiates the indivisible action by asserting the 


- LOCK signal on the L-bus. A RMW-Write operation 


is used to terminate the action. 


When an RMW-Read transaction occurs, the block 
of memory addressed is marked by the BXU control- 
ling that portion of memory as locked (the lock cov- 
ers a fixed address space based on address bits 4 
and 6). Once locked, any other RMW-Reads to this 
block will be rejected, but the block remains avail- 
able for other types of memory operations. 


When an RMW-Read is issued, the BXU controlling 
the affected memory will either respond with data in 
a normal Read Reply (and set the appropriate lock), 
or it will respond with a Reissue Reply indicating that 
_ the requested block is already locked. If refused, the 
' requesting BXU will wait a short interval and then put 
the RMW-Read request back into the arbitration pro- 
cess and try again. . | : 


RMW-Writes are equivalent to Write Word(s) except 


‘that it resets the lock for that memory location. The — 


only valid reply packet is the Ack (Write Reply). 


Interagent Communications (IAC) 
Support 4 ; | 


Bus Extension Units and 80960MC processors com- 
municate by sending Interagent Communication 
(IAC) messages, which are a set of memory-mapped 
addresses recognized by all BXUs. These messages 
are used for such system functions as initialization, 
cache flushing, access to error logs and interrupts. 
The upper 16 Mbytes of the 80960MC’s 4 Gigabyte 
address range are reserved for |AC communica- 
tions. | | 


Processor 0 Priority | 
_ Processor 1 Priority 


Processor 1 Message 
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IAC requests fall into two major groups: messages 
and register requests. Messages are sent between 
processors to cause a processor to perform a spe- 
cific action (e.g., start, stop, flush cache, etc.) and 
are held in the IAC message support registers; Table 
4 summarizes the function of these four registers. 
Register requests are used by software to read and 
write to BXU registers in order to control the system 
operation or configuration. 


An IAG message always originates on an L-Bus and 


usually from a processor. From the originator, the 


request flows to the BXU where it may be handled 
internally or propagated on to the AP-Bus. If the [AC 
is sent on to the AP-Bus, the final destination of the 
IAC (another BXU) must reside on that bus. The IAC 
will not be propagated onto another L-Bus or AP- 
Bus. |AC messages can be one to four words long. 


Although each L-Bus (processor or memory module) 
may be connected to as many as four AP-Buses, at 
any point in time only one bus will be designated as 
the message bus. All IAC messages will flow over 


_ that bus. The BXUs on the message bus are respon- 


sible for handling the IAC message traffic on behalf 
of the processors residing on their L-Bus (an L-Bus 
may support one or two processors). 


AP-Bus 0 normally serves as the message bus. If 
AP-Bus 0 is not functional, then AP-Bus 1 serves as 
the message bus, completely transparent to the 


_ software. Processors are unaware of which bus is 
~ actually acting as the message bus. : 


I/O Prefetch Support 


The BXU offers two !/O prefetch channels to pro- 
vide high bandwidth, low latency access to memory 
for sequential transfers. Each channel buffers 32 


_bytes of data in two 16-byte blocks. As data is re- 


quested from the buffers, the BXU automatically pre- 
fetches the next data block. The BXU can take 


: | Table 4. [AC Support Registers 


This register holds the priority of the task (process) which Processor 0 on the 
BXU’s L-Bus is currently executing. , 


Processor 0 Message This register buffers four words of data from an IAC message for Processor 0. 


This register holds the priority of the task (process) which Processor 1 on the 
BXU’s L-Bus is currently executing. | 
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advantage of the three-deep AP-Bus pipeline to 
quickly fill the buffers if it ever gets behind because 
of momentary surges in AP-Bus traffic. In this way, 
the prefetch logic acts to provide stable, bounded 
response times, even in large multiprocessor config- 
urations. 


Because the normal operation of the BXU hides the 
latency of write requests by replying immediately on 
the L-Bus, the prefetch unit operates only for read 
requests. On a read request from the L-Bus, the pre- 
fetch logic returns the amount of data requested. 
Any processor or intelligent device used with the 
BXU must guarantee that it will split all memory re- 
quests that cross 16-byte boundaries into two re- 
quests. 


Cache Support 


The main function of a cache is to provide local high 
speed storage for frequently accessed memory lo- 
cations. Storing the information locally, the cache 
intercepts memory references and handles them di- 
rectly without transferring the request to the AP-Bus. 
This action results in lower traffic on the AP-Bus and 


decreased latency on the L-Bus, leading to im- . 
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proved performance for a processor on the L-Bus. It 
also increases potential system performance in a 
multiprocessor system by reducing each processor's. 
demand for AP-Bus bandwidth, thereby allowing 
more processors in a system. 


The BXU provides cache directory, coherency logic, 
and control signals, while external SRAM is used for 
data storage. A CACHE signal output from the 
80960MC processor indicates to the BXU whether a 
request is cacheable. The operation of the BXU 
cache is not dependent on the size of the data trans- 
fer and therefore can support partial writes. Both 
data and instructions can be contained within the 
local cache. 


The BXU supports a two-way, set associative cache 
with 64 sets. The (read address) tag field is 20 bits 
long and consists of LAD lines 31-12. There are 
eight bits that indicate if a line is valid (a line is 16 
bytes). The control bits in the cache control registers 


can be used to mask some of these bits to change 


cache configurations. Ail entries in the directory can. 
be invalidated by sending an INVALIDATE CACHE {| 
Command to each BXU in the module. Figure 4 
shows one example of a BXU cache directory and 
its relation to L-Bus addresses. 


WAY, WAY, 


>[ STORED ADDRESSES | STORED ADDRESSES ] SET 0 


Pe 


SET 2 


I SET 62 
SET 63 


~ COMPARE _ 


ENCODER 


WAY BIT : 
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Figure 4. Example of a Cache Directory Array 
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A single BXU supports 16 Kbytes of cache. When a 
processor module uses multiple BXUs (and there- 
fore multiple buses), the BXUs cooperate to provide 
a larger directory and addressing for a larger cache. 
The best way to view this larger directory is to think 
of it as having an increased number of sets. Thus a 
cache managed by two BXUs will have a directory 
consisting of 128 sets instead of 64. The maximum 


size cache is 64 Kbytes (four BXUs supporting four 


AP-Buses per processor module). ee. 


The cache is managed using a write-through policy 


that guarantees that the shared system memory will 
always have the most recent copy of all data; BXU 
caches never contain the only copy of revised data. 
Any time a processor updates a cache entry, it al- 
ways causes a write request on the AP-Bus, so that 
there are never any hidden updates. In addition, all 
BXUs monitor AP-Bus traffic to detect if an update is 
being made’ to a location which they are storing in 
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their own cache. If so, that line in the cache directory 
is marked invalid. This procedure guarantees that a 
BXU cache will always return correct data. even 
when a system uses multiple caches, when multiple 
processors treat a single data item differently (some 
caching, some not), or when two processors are 
used on a single L-Bus. 


An example of an SRAM control design using a sin- 
gle BXU is shown in Figure 5. The BXU supplies six 
memory control signals to interface the directory and 
control logic with an external cache composed of 
static RAM: Cache Read (CR), Cache Write (CW), 
WayO (WY0), Way1 (WY1), WordO (WDO), and 
Word1 (WD1). SRAM control also requires use of 
the L-Bus byte enable (BE3-BEO) signals and cer- 
tain address lines. To simplify latching the byte en- 
able signals, the BXU asserts READY on all address 
and recovery cycles as well as when it is transferring 
data. 7 
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Figure 5. Sample Cache SRAM Control Design Using a BXU 
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The tight timing specifications of SRAMs require a 
small amount of external logic to interface a static 
RAM cache to a BXU. Since all BXU cache signals 
have a relatively wide clock to data valid specifica- 
tion (Teg), external flip-flops are used to achieve 
tighter resolution of the Cache Write and Word edg- 
es. The address bits are latched using ALE from the 
processor. Way0 selects between the two “ways’”’ in 
the cache directory, and Way1 selects between the 
cache and private memory (if present on the L-Bus). 


In order to ensure that the cache is filled properly, 
the byte enable latch is cleared on read requests. If 
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the processor made a read request for two bytes 
that missed the cache, the BXU would first write the 
entire 16-byte block, then return the requested infor- 
‘mation to the processor. If the byte enable latches 
weren’t set, then the write into the cache wouldn't 
work correctly because not all byte enables would 
be asserted. Byte enable information does not need 


_ to be held on reads because data is always returned 


in full words and the processor selects the portion of 


the word that it needs internally. Signal timings are 


shown in Figures 6-10. 
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Figure 6. Cache Read Signal Timing for 35 ns SRAMs 
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Figure 8. Cache Read Signal Timing for 70 ns SRAMs — 
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The BXU has four memory address recognizers for 
the L-Bus plus an additional recognizer for initializa- 
tion RAM. Three of the memory address recognizers 
(Mask2—0 and Match2-—0) map to shared system 
memory, while the fourth address recognizer maps 
requests to SRAM on the local bus, called private 
memory. The INIT-RAM recognizer serves two func- 
tions: it enables bootstrap software to use. the 
SRAM cache as a scratch pad during system initiali- 
zation, and it provides the means for executing a 
memory test on the SRAM cache. The private mem- 
ory recognizer allows SRAM to be used on the local 
bus as normal memory in addition to a cache. Pri- 
vate memory is not accessable by other modules on 
the AP-Bus. 


Memory Module Support 


When operating in Memory mode, the BXU is a Lo- 
cal Bus master and only handles requests inbound 
from the AP-Bus. The cache control logic is disabled 
since it is unnecessary in a memory module. 


A read request received by an idle BXU will be seen 
on the L-Bus 1.5 clock cycles after it was received 
on the AP-Bus. BXUs offer two reply speed options 
for inbound Read requests. The high-performance 
option, called the ‘‘fast reply” mode, allows data to 


flow onto the AP-Bus with only a half-cycle delay - | 


through the BXU. This option requires the L-Bus 
memory controller to be able to supply data on every 
clock cycle. In the “slow reply” mode, the BXU buff- 
ers the entire AP-Bus reply packet before sending it 
onto the AP-Bus. This option permits the use of 
slower, less costly memory. 


Write requests are fully buffered before being 
passed to the L-Bus. Once the BXU has received an 
error-free packet, it initiates the L-Bus. transaction. 
When the last data word has been accepted on the 
L-Bus, the BXU generates a reply on the AP-Bus. 


In memory mode, the BXU provides two or four 


Ready-Modify-Write locks with timeouts. Four locks. 


are available if the module is not interleaved with 
other modules, two locks if it is interleaved. When 
interleaving occurs, address bit 4 is used as part of 
the address recognition for the module, which there- 


by restricts a module to use either locks 0 and 2, or | 


1 and 3. This approach ensures that if a bus switch 
occurs, the locks that may have been allocated on 
the failed bus will not overlap with locks that are 
currently allocated on the surviving bus (since all 
traffic is rerouted to the surviving bus). 


FAULT TOLERANCE 


Three basic tenets form the basis for the implemen- 
tation of 80960MC fault tolerant systems. First, 
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fault tolerant functions are achieved through the rep- 
lication of VLSI components. Second, the system is 
partitioned into a set of confinement areas which 
form the basis of error detection and recovery. Third, 
only bus-oriented communication paths are used to 
provide system communication. 


The BXU is unique in that it provides all the functions 
necessary to detect, isolate, and recover from a fail- 
ure in any single system module or AP-Bus. Unlike 
many other fault tolerant system designs, 80960MC 
systems do not rely on voter components for fault 
detection, thereby eliminating one potential source 
of single-point failures. Although the BXU registers 
must be initialized by software, all the fault tolerant 
mechanisms are built into the hardware, and correct 
fault recovery of a system built using the BXU does 
not depend on software intervention. 


_ The purpose of a confinement area is to inhibit dam- 


age from error propagation and to isolate the faulty 
area for subsequent recovery and repair. A confine- 


ment area is defined as a unit (system module or : 
AP-Bus) that has a limited number of tightly con- |.d:4esie 


trolled interfaces. Figure 11 shows the confinement 
areas within a small system. Detection mechanisms 
exist at every interface to ensure that no inconsist- 
ent data can leave the confinement area and corrupt 
other confinement areas. When a fault occurs in the 
system, it is immediately isolated to a confinement 
area. The fault is known to be in that confinement 
area, and all other confinement areas are known to 
be fault-free. All intermodule communication in an 
80960MC system occurs over buses. There are no 
point-to-point or daisy-chained signals. 


This arrangement makes. modular growth and on- 
line repair possible since no signal definition is de- 
pendent on the number of resources in the system. 
The presence or absence of any module cannot pre- 
vent communication between any other modules. 
The AP-Bus provides a uniform communications 
matrix that allows multiprocessor and fault-tolerant 
systems to expand modularly. 


In 80960MC systems, there are three distinct steps 


~ in responding to an error. First, the error is detected 
and isolated to a confinement area. Next, the error is 


reported to all the modules in the system. This ac- 
tion prevents the incorrect data from propagating 
into another confinement area and provides all the 
modules with the information required to perform re- 
covery. Finally; the faulty confinement area is isolat- 
ed from the system. Recovery occurs through the 
application of redundant resources available in the 
system. Table 5 describes the fault-tolerant control 
registers. | ; He 
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Figure 11. Fault Confinement Areas in an 80960MC System 


Table 5. Fault Tolerance Support Registers and Commands 


Test Type The Test Report command instructs the BXU to test the error reporting 
, network. The type of error report generated is determined by the content of 
this register. | _ 7 | 


Spouse ID , Ina QMR module, this register holds the module ID of the FRC module to ~ 
| a which this module is married. | 
Lee ee The contents of this register determine if a module is part of a QMR pair, an 


if it should function as the primary or shadow. in the pair. 7 : 
Module Error ID 
Bus Error ID 


Error Log _ 


Identifies the BXU as part of a specific module confinement area. 


Determines the Bus ID contents.in an error report. 


Records the type of the most recent error report received and the number of 
errors that have occurred since the last Terminate Permanant Error Window — 
-command. oe ee 


Holds the contents of the previous error report. | ; - 23 a 
FT2 Holds additional fault-tolerant control parameters. | | : 


Test Report Command The Test Report command instructs the BXU to test the error reporting 
ei 8 network. The type of error report generated is determined by the contents of 
the Test Type Register. | | : 


Primary Catastrophe 
Command 


Shadow Catastrophe 
Command 


A write to this register causes a Primary Catastrophe error report, usually 
indicating a primary module power failure. 


A write to this register causes a Shadow Catastrophe error re 


port, usually 
indicating a shadow module power failure. | 


Terminate Permanent 
Error Window 
Command 


| Attach Bus Command 
_| Detach Bus Command | 


Sync Refresh Command 


A write to this register closes the permanent error window, so thata | 
reoccurance of a previous error is not recorded as permanent. 


A write to this register causes the identified bus to be atta 


ched to the system . 
and become active. pe : 


A write to this register causes the identifie 


d bus to be detached from the 
system and become inactive. : Me 


A write to this register causes BXUs in memory mode to assert their ForceRef 
pin and enables AP-Bus address matching. 
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Functional Redundancy Checking 


BXU components can be paired together to com- 
pare their outputs to ensure that they agree. This 
detection mechanism is called Functional Redun- 
dancy Checking (FRC) because identical compo- 
nents are used to check operations. 


At initialization time, one component in the BXU pair 
is selected to be the “Master”, while the other is 
designated the “Checker”. The Master BXU is re- 
sponsible for carrying out the normal operation of 
the system and behaves as it would if it were operat- 
ing in a non-fault tolerant system. The Checker BXU, 
in contrast, disables its AP-Bus outputs and instead 
monitors the AP-Bus pins of the Master (see Figure 
12). The Checker BXU is responsible for duplicating 
the operation of the Master and using its internal 
comparison circuitry to detect any inconsistency be- 
tween its result and the output of the Master. 


The Master and Checker BXUs run in lock step, 
comparing operations cycle-by-cycle. If at any point 
the Master or Checker disagree, an FRC error will be 
signaled and an error reporting cycle will begin. 


When using the FRC mechanism, the BXU pins © 


comprising the electrical connection to the AP-Bus 
must be connected together. A BXU provides FRC 
coverage on the AD, SPEC, BOUT and MODCHK 
pins. 


PROCESSORS, 
1/0, OR 
MEMORY 
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Failures in the Checker’s AP-Bus drivers can be de- 
tected by reversing the role of the Master and 
Checker BXU. When Master/Checker Toggling is 
enabled, the Roles of the Master and Checker are 
switched after each bus cycle. - 


Parity, Duplication and Timeouts 


In order to prevent incorrect AP-Bus operation for 
passing corrupted data to the BXU (and onto the 
Local Bus), the BXU uses parity, signal duplication, 
and bus timeouts to check for errors. Specifically, 
the AP-Bus has interlaced parity bits covering the 
AD and SPEC signals, signal duplication is used on 
both arbitration and RPYDEF, and a bus timer is set 
to monitor the bus for non-response to a request. 


The BXU calculates two separate parity bits across 
alternate AD and SPEC signals, which are indicated 
by the CHKO and CHK1 pins. CHKO is even parity 
across the even AD and SPEC pins, and CHK1 is 
even parity across the odd pins. Since the arbitration 
and RPYDEF lines are driven independently by mul- 
tiple bus agents (BXUs), parity cannot be used for 
error detection, rather the detection of errors is done 
by duplicating each set of lines, one set for Masters, 
the other set for Checkers. Consequently, each BXU 
connects to only one arbitration network. If there is a 
disagreement between the two sets of signals on 


AP=BUS 
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Figure 12. Functional Redundancy Checking (FRC) 
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the AP-Bus, it will be detected through an FRC dis- 
agreement. The BXU uses a timer to determine if no 
response has been received and too long a period 
-has elapsed since the bus request was made. Dur- 
ing normal operation the timer is active whenever 
the bus pipeline is not empty. The timer is reset on 
every bus reply or deferral. If the BXU: was the 
source of the requests and a timeout occurs, it sig- 
nals a Bad Access Reply on the AP- Bus. The timer 
is nominally 64 clocks. 


Error Reporting 


The error reporting network is the backbone of fault 
isolation and recovery. When an error is detected, 
the BXU detecting the error reports its type and lo- 
cation to all other nodes in the system. The error 
reporting network is designed so that, independent 
of an error in the system, each node not only. re- 
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ceives an error report, but is guaranteed to receive 
the same error report. Each BXU in the system uni- 
formly logs each error report, and is able to use this 
information to proceed independently with the ap- 
propriate recovery procedure. 


The BXU has two serial Error Reporting Lines asso- 
ciated with each bus interface (BERLs for the AP- 
Bus and LERLs for the Local Bus). An indentical se- 
rial error report is sent over each pair of lines associ- 
ated with each bus. 


An AP-Bus error reporting cycle consists of five 
phases: Reporting, Partner Communications, Tran- 
sient Waiting Period, Retry, and the Permanent Error 
Window (see Figure 13). The reporting phase lasts 
256 cycles from the beginning of the first report re- 
ceived on the BXU’s error reporting lines. The BXU 
becomes quiescent as soon as it detects the start bit 
of an error report and remains quiescent through the 
Transient Waiting Period. 
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Figure 13. Error Reporting Cycle 
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During partner communications, BXUs communicate 
with each other via their POPQUE lines to determine 
whether to retry accesses in the case that one of the 
AP-Buses is removed from the system. Partner or- 
dering lasts 256 cycles. 


Transient waiting enables the system to sustain dis- 
turbances from mechanical vibrations and brief elec- 
trical transients without needing to permanently re- 
configure the system. The BXUs simply wait a pre- 
determined time for the transient to subside. The du- 
ration of the Transient Waiting Period is adjustable 
and can be set by software (16 us to 500 ms at 
16 MHz). During this period, the BXU completes its 
internal recovery mechanisms (if the error is perma- 
nent). Since the transient waiting mechanism on the 
buses depends on all buses moving to the retry 
state at the same time, all BXUs must have identical 
values for the Transient Waiting Period. 
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During the RETRY phase, all accesses that were 
pending at the time that the error report was re- 
ceived will be retried. At the same time as RETRY 
begins, the BXU enters the Permanent Error Win- 
dow. During this interval, the BXU watches for the 
error to reoccur. 


Each BXU has two registers that are used for log- 
ging error reports. The ERROR LOG register con- 
tains the current error report and the ERROR REC- 
ORD register contains the previous error report. 
When a error report is received, the contents of the 
ERROR LOG register are copied into the ERROR 
RECORD register. Both registers are accessible by 
software and are the primary means by which the 
software routines responsible for system manage- 
ment communicate with the hardware fault handling 
mechanisms. Table 6 lists the types of errors that 
can be reported. 


Table 6. Error Types pial 


Primary Catastophe 


Shadow Catastophe 


Error Reporting Error =} 
Bus Arbitration 


reporting lines. 


COM Altered - 


Unsafe Confinement This type of report is issued when an error is detected that would make a retry eo : 
Area dangerous. ° 


Generated in response to a Primary Catastrophe Command from software. 
The command is usually issued when al primary modules are about to fail 
because of a loss of power. | 


Generated in response to a Shadow Catastrophe Command from software. 
The command is usually issued when all shadow modules are about to fail 
because of a loss of power. | 


The report indicates that a BXU has detected a fiona on one of its error 


This report is issued when an FRC error is detected on the BOUT pin of the 
BXU indicating a bus arbitration error. 


Bus Parity Indicates that a parity error has been detected on the AP-Bus. 


Component Indicates that a checker has detected an FRC error while its master was 
driving the AP-Bus. 

Uncorrectable Array An uncorrectable error has been detected in one of the memory arrays. 

Error 


Correctable ECC A correctable error has been detected in one of the memory arrays. 


This error report occurs when the COM input is toggled (two cycles high, 
followed by two cycles low) and may be used by external circuits to notify the 
system of an external fault. 


Attach Bus Issued in response to an Attach Bus command, this report is used to 
reactivate a bus that was previously out of service. 

Detach Bus Issued in response to a Detach Bus command, this report is used to remove a 
faulty bus from the system. . 


Terminate Permanent Receiving this report signifies the end of the Permanent Error Window. 7 

Error Window . 

Sync Refresh Used to synchronize memory modules that are being married to form a 
Primary/Shadow Pair. 
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The BXU’s hardware compares the contents of the 
two error reporting registers to determine if a bus 
retry has resulted in a repeat of the previous error 
(which therefore must be considered a permanent 
error). Software can clear the two registers by send- 
ing a Terminate Permanent Error Window command. 
The registers allow software to monitor the health of 
the system and to respond appropriately in case of 
hardware problems. The availability of this informa- 
tion simplifies diagnostic routines. 


The ERROR LOG register is handled independently 
by hardware and software; hardware always re- 
sponds immediately to an error report so that it is 
never lost by failure of software to respond. During 
normal system operation, software should never 
write to this register, since it is both read and written 
by hardware. The ERROR LOG register is cleared 
onacold start, but its contents are retained across a 
warm start. 


PRIMARY 
CHECKER 


9 


MASTER 


| Processor | 
‘yor Memory 


i Processor | 
~ por Memory) 


L=BUS § 


M82965 


| ms2965 Lod 
~ BXU j BXU j " 


AP=Bus 0 


M82965 
BXU 


M82965 
BXU 


AP=Bus 1 


M82965 


ADVANCE INFORMATION 


RECOVERY MECHANISMS 


Module Shadowing 


Automatic recovery from permanent single-point fail- 
ures in a module is accomplished through module 
shadowing, or what is more formally called Quad 
Modular Redundancy (QMR). Using this technique, 
two FRC pairs (master/checker) of the same type 
are logically linked to form a primary/shadow pair 
(see Figure 14). The marriage of the two modules is 
performed by software which sets the logical ID of 
the two modules equal and restarts them in lock 
step (or synchronous operation). There is no direct 
electrical connection between a _ primary/shadow 
pair. They are usually on separate boards so that 
either can be removed in the case of a failure in that 
module. : 
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The primary/shadow pair operate in lock step so 
that there is always a complete and current backup 
for an FRC pair. At any point in time, one FRC pair 
will be active (i.e., sending its output to the AP-Bus) 
while the other will be passive (i.e., its outputs will be 
disabled). Initially, the primary FRC pair is active and 
is responsible for issuing requests or replies to the 
AP-Bus. Data leaves only by means of the active 
FRC pair. 


As an option, the roles of active and passive mod- 
ules are switched after every second bus cycle. (In 
contrast, master/checker pairs are toggled every cy- 
cle). This ping-pong action exercises all of the logic 
in both primary and shadow modules. Any latent fail- 
ure that exists in the AP-Bus drivers will be detected 
immediately. All of the logic to perform this lock step 
operation is contained in the BXU and neither the 
processors nor any discrete logic contained in a 
module is aware that the module is participating as 
one-half of a primary/shadow parr. 


Each physical FRC pair (primary and shadow) re- 
mains a self-checking pair. Whether in an active or 
passive module, all detection mechanisms remain 
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Figure 15. Faulty Modules are Automatically Disabled 
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enabled and continuously check the operation of 
that module. Neither the primary nor the shadow 
check the operation of the other; FRC is used for 
fault detection, while module shadowing (Quad Mod- 
ular Redundancy) is used to ensure. immediate re- 
covery. | : 


Automatic Module Recovery 


lf a permanent error is detected in either a primary or 
a shadow FRC pair, the faulty pair will immediately 
be disabled as all BXUs in the pair shutdown. The 
surviving spouse then separates itself from the faulty 
FRC pair and operates as an active pair on every 
bus cycle. At that point, recovery is complete. 


Hardware recovery is autonomous and requires no 
software intervention to complete. The operating 
system can be informed that a hardware reconfigu- 
ration has taken place by tying an error report line to 
one of the processor’s interrupt pins. Then when a 
fault occurs, a processor can examine the error re- 
port log to discover what has happened and then re- 
examine the system configuration. Figure 15 shows 
an example of module recovery. 
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Bus Switching 


All AP-Buses in an 80960MC system are physically 
identical, but when a system is operational each bus 
handles a unique address range. The BXU has been 
designed so that it is possible to pair together two 
AP-Busses and have them act as redundant or alter- 
nate resources for each other. AP-Bus 0 is paired 
with AP-Bus 1 and AP-Bus 2 is paired with AP-Bus 3. 
In order for an FRC pair to have an additional bus, it 
must also have another pair of Master/Checker 
BXUs. Normally the memory addresses will be inter- 
leaved between the two (or four) buses, but this | isn’t 
necessary for bus switching. 


Since the AP-Bus does not hold state information 
(as do processors and memory), all buses in the sys- 
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tem may be used during normal operation. There is 
no degradation of throughput to achieve bus redun- 
dancy. Each bus is fully operational. 


When a permanent error has been detected on an 
AP-Bus, all BXUs on the faulty bus disable them- 
selves. L-Bus requests for the failed bus will be ig- 
nored by the disabled BXUs and picked up instead 
by the BXUs attached to the backup bus. If a BXU 
has a cache, the BXU invalidates its cache directory 
since the directory must be reorganized to match the 
new (and larger) address space, including a new in- 
terleaving factor. Figure 16 shows an example of 
bus switching. 


PRIMARY SHADOW 
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Hardware automatically reconfigures to bypass the faulty bus (AP-BuSo). 


AP-Bus, takes over the address space of AP-Busp. — 


Figure 16. If a Bus Fails, Its Backup Bus Takes Over Immediately 
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Self-Healing Systems 


In some applications it is important to guarantee the 
integrity of the data, but momentary interruptions in 
processing can occur without seriously affecting op- 
erations or jeopardizing human lives. For these ap- 
plications, a cost effective approach may be to use 
self-healing systems. 


Self-healing systems use Functional Redundancy 
Checking to ensure that all errors are detected and 
that faults are confined within a module. Fault recov- 
ery is not automatic; recovery and reconfiguration is 
done by software following error detection. Self- 
healing systems are less costly than fully fault-toler- 
ant systems because fewer components are neces- 


sary. 


Self-healing systems do not operate continuously in 
the case of a hardware failure. Program execution 
cannot proceed after detection of a permanent error 
until the system has been reconfigured. Transient 
errors will still be taken care of by the hardware 
components. Upon detection of a permanent error, 
the system will cease operation, however FRC en- 
sures that no data will have been corrupted. 


After the system stops, it must be reset and a diag- . 


nostic program run which reads the BXU errors logs 
and determines the most appropriate action to take. 
Recovery and reconfiguration may be complete and 


the system back on-line within a few seconds to sev- 


eral minutes, depending on the nature of the fault. 


Self-healing systems are not appropriate for real- 
time applications where program delays longer than 
a few milliseconds cannot be tolerated. In these crit- 
ical applications, an interruption in system operation 
might result in damage to expensive material and 
equipment, or endangerment of human lives. The 
80960MC system fault tolerant architecture provides 


the means for building systems that will recover au- 


tomatically within 48 js. 
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BXU Regisiers 


Initialization and control of the BXU is done by read- 
ing and writing the BXU’s internal registers. The reg- 
isters are mapped to the upper 16 Mbytes of the 
80960MC processor’s physical address space. 


Initialization of a system using BXUs occurs in three 
stages. In the first stage which immediately follows 
RESET, all registers (except for the registers con- 
taining error report information) are loaded with O or 
with values sampled off a set of pins. 


During this stage the BXU’s System Bus ID and 
mode of operation are established. In the second 
stage, software assigns logical, physical, and arbitra- 
tion IDs to each BXU. Then in the third stage, the 
COM pin can be used to load board-specific infor- 
mation into the BXU and software can change the 
default values of any of the registers. 


Once software has established the initial configura- 
tion of the system, no further interaction between 


the system software and the BXU may be necessary 


except for testing the error reporting functions and 
for making on-line changes to the system’s initial 
configuration. | 


' This Advance Information Data Sheet contains a 


functional description for each of the BXU’s major 
register groups. For more specific details on control- 


ling each of the registers, please consult the 
80960MC Hardware Designer’s Reference Manual. 


SIGNAL DESCRIPTIONS 


Tables 7 through 11 describe the function of each of 
the BXU signals. Many of the pins are multiplexed 
and have different interpretations depending on 
whether the BXU is in Processor or Memory mode. 
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Table 7. M82965 BXU L-Bus Signals 


symbol 


| LOCAL ADDRESS/DATA BUS: Carries 32-bit physical addresses and data to and from a 
processor or memory. During an address (Tq) cycle, bits 2-31 contain a physical word 
addres (bits 0-1 indicate SIZE; see below). During a data (Tg) cycle, bits 0-31 contain 
read or write data. The LAD lines are active HIGH and float to a three-state OFF when the 
bus is not acquired. 
SIZE: Which is comprised of bits O-1 of the LAD bus during a Tg cycle, specifies the size 
of a transfer in words. 
LAD; LADo 
0 0 1 Word 

0 1 2 Words 

1 0 = 3Words 

1 1 4Words 


ADDRESS-LATCH ENABLE: Indicates the transfer of a physical address. ALE is 
_ | asserted during a Ta cycle,and deasserted during Ty cycles and the second half of Ta. 
| cycles. It is active LOW and floats to a three-state OFF when the L-Bus is not acquired. 


'| ADDRESS STATUS: Is used to detect address cycles and additional data cycles. © 


‘CACHEABLE: During a Tg cycle, specifies whether data is cacheable. When operating 
in the MEMORY mode this pin should be tied to ground through a 10 kf resistor. 


WRITE/ READ: specifies, during a Tg cycle, whether the operation is a write or read. It is 
latched on- -chip and remains valid during Tg cycles. 


CACHE WRITE: (Defined only when the BXU is in PROCESSOR fide)! This signal 
indicates that the cache SRAM should be written with data from the L-Bus and is used to 
_generate the chip select, and write enable signals required by the SRAM. The signal is 
open drain so it can be shared among multiple BXUs controlling a single set of SRAMs. 
DATA ENABLE: (Defined only when the BXU is in MEMORY mode). Is asserted during 
Tp cycles and indicates transfer of data on the local AD bus lines. 


CACHE READ: (Defined only when the BXU is in PROCESSOR mode). This Sana | 
indicates that the cache SRAM should drive data onto the L-Bus in response to a read 
request and is used to generate the chip select and output enable signals required by the 
SRAM. This signal is open drain so it can be shared among multiple BXUs controlling a 
single cache. 

DATA TRANSMIT/RECEIVE: (Defined only when the BXU is in MEMORY mode). 
Indicates the direction of data transfer. It is low during T, and Tg cycles for a read or 
interrupt acknowledgement; it is high during Tg and Tg cycles for a write. DT/R never 
changes state when DEN is asserted. 


BUS LOCK: Is used by the BXU to distinguish between normal reads and RMW-reads, 
normal writes and RMW-writes. 

An 80960MC processor asserts LOCK at the beginning of an RMW cycle, and the BXU 
recognizes it as an RMW-read. If the read operation is accepted by the module serving 
memory, the processor drops LOCK, and executes an RMW-write. LOCK is also held 
asserted during an interrupt-acknowledge transaction. 


READY: Indicates that data on LAD lines can be sampled or removed. If READY is not 
asserted during a Tg cycle, the Tg cycle is extended to the next cycle, and ADS i is not 
asserted in the next cycle. READY is driven on Ta, Ty and Tj cycles. 


NOTES: 
I/O = Input/Output, | = Input, O = Output, 0.D. = Open Drain, T.S. = three-state 
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Table 7. M82965 BXU L-Bus Signals (Continued) 


Symbol Type Name and Function 


BE3_ BYTE ENABLES: Specify which data bytes on the local bus will take part in 
BEo O.D. 


the next bus cycle. BE3 corresponds to LADo4—-LAD3; and BEg corresponds 

to LADp-LAD7. 

HOLD/ HOLD: Indicates that a master I/O peripheral requests control of the bus. 

HOLDAR When the BXU receives HOLD and grants the peripheral control of the bus, it 
floats the bus lines and then asserts HLDA and enters the T, state. When 
HOLD is deasserted, the BXU will deassert HLDA and go to either the T; or Ta 
state. _ 
HOLD ACKNOWLEDGE REQUEST: Is an input to the secondary bus master 

that the primary bus master has relinquished control of the bus. 
HLDA/ = HOLD ACKNOWLEDGE: Relinquishes control of the bus to a master I/O 
HOLDR 


peripheral. 


HOLD REQUST: Is used by a Secondary Bus Master to request use of the 
bus from the Primary Bus Master. 


Table 8. V182965 BXU L-Bus Module Support Signals 


Symbol | Name and Function 


BADAC BAD ACCESS: If asserted in the cycle following the one in which the last READY of a 
| transaction is asserted as a result of a bad access, it indicates that the transaction has: 
-exceeded the AP-Bus time-out period. 


INTERAGENT COMMUNICATION: PROCESSOR 0: (Defined only when the BXU is in 
PROCESSOR mode). Is an open-drain output that indicates that there is a pending IAC | 
message for Processor 0 on the BXU’s local bus. os 

EXTERNAL ERROR: (Defined only when the BXU is in MEMORY mode). Is an input that 
indicates that an error has been detected in external logic (e.g., a failure in a discrete 
memory controller). 


IAC, /FRFE INTERAGENT COMMUNICATION: PROCESSOR 1: (Defined only when the BXU is in 
PROCESSOR mode). Is an open-drain output that indicates that there is a pending IAC 
message for Processor 1 on the BXU’s local bus. 

FORCE REFRESH: (Defined only when the BXU is in memory mode). Is an open-drain 
output that tells the external memory controller to immediately execute a refresh 
operation. : 


PFETCH PREFETCH: Is used in conjunction with the Cache and Write/Read (W/R) signals to 
define the type of request being issued (0 = LO, 1 = HI): 
PFETCH CACHE W/R 
0 0 0 Read using Prefetch Channel 0 
1 Start for Prefetch Channel 0 
0 Read using Prefetch Channel 1 
| | 1 Start for Prefetch Channel 1 | 
0 §Noncacheable Read ) 
1 Noncacheable Write 
| 0 Cacheable Read 
1 Cacheable Write | a 


NOTES: 
//O = Input/Output, | = Input, O = Output, O.D. = Open Drain 
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_ Table 9. M82965 BXU AP-Bus Signals 


SYSTEM ADDRESS/DATA LINES: Carry 32-bit addresses and data 
between modules (BXUs) on an AP-Bus. The content of the AD lines is 
defined .by the SPEC encoding during the same bus cycle. 


PACKET SPECIFICATION: Signals define the packet type and the ~ 
parameters required for the transaction: 7 
SPECs: REQUEST: Is asserted if the packet is a request packet. 
~SPEC,: MULTICYCLE: Is asserted if the packet consists of more than 
one bus cycle. | oe | 
SPEC3-SPEC2: CYCLE COUNT: These two bits are used in 
conjunction with Request and Multicycle signals to specify the length 
of the packet (in bus cycles) and the data length (in-words). a 
SPEC,-SPECo: OPERATION/STATUS TYPE: These two bits identify 
the specific operation or status conveyed by the packet. 


CHECK SIGNALS: Provide interlaced parity for the SPEC and AD 
lines. : | 


ARBITRATION: Signals are used by the bus agents (BXUs) to 
determine which agent has access to the bus next. These signals have 
a timing that is one-half cycle out of phase with the AD lines. 


REPLY DEFER: Signal allows an agent to give up its ‘‘slot” on the bus 

temporarily if its access is going to take a long time. This action 

reorders the pipeline, moving the deferred request to the bottom of the. 
_ queue, resets the bus time-out counter and permits another agent to 

use the bus. | | 


BUS ERROR REPORT LINES: Is used to signal errors from bus 
__ transactions or from within modules connected to the bus. 


NOTES: : 
1/O = Input/Output, | = Input, O = Output, O.D. = Open Drain 
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Table 10. M82965 BXU AP-Bus (Local Agent) Support Signals 


Symbol : Name and Function 


CLK2 SYSTEM CLOCK: Provides the base timing and synchronization for all agents 
(BXUs) in the system. It is sourced to all agents from a central clock and is 
twice the frequency of the bus cycle. 

NOTE: 
The clock skew over the AP-Bus for a typical system should be no greater 


than 6 ns for correct system operation. 


_ BUS OUTPUT CONTROL: Is asserted whenever a component is driving the 
AP-Bus. Functional Redundancy Checks on BOUT can be used to detect 


arbitration failures. 


B 
MODCHK MODULE CHECK: Is connected between Master/Checker pairs, allowing a 
Functional Redundancy Check to be performed on internal states. 


OUT © 

INITID INITIALIZE ID: Is connected to one of the 32 AD lines and is used in 

conjunction with the IDENTIFY DEVICE IAC to provide a unique address for 
each BXU at initialization time. 


VREF VOLTAGE REFERENCE: Provides a stable voltage reference for the input 
buffers of components connected to the AP-Bus. External hardware must 
provide a VaeF/W voltage (see Table 14) on the Vaer pin during normal 
operation of the component. The Vp_r pin is also used to distinguish between 
a warm start (system memory and the Error Record register retain their state) 
and a cold start (system memory and BXU registers are cleared). 


RESET RESET: Forces all agents on the bus to reset and synchronize. The bus cycle 
begins the first CLK2 period after RESET is deasserted. The RESET signal is 


the way a BXU is synchronized to the rest of the system. 


COM COMMUNICATION: Can be used to load information into a component as 
D. part of the initialization sequence or to inform external logic that the 
| component has failed. The BXU will asserted COM if it has shut itself off due 
to a failure in its module. 
The COM signal is not involved in any aspect of AP-Bus operation, but can be 


used to load board-dependent information into the BXU or to signal the rest of 
the system that an external error has occurred. 


NOTES: 
I/O = Input/Output, | = Input, O = Output, O.D. = Open Drain 
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Table 11. M82965 BXU Module Support Signals 


‘Symbol : tae Name and Function | 


WY ,/COR | ~ . WAYg: (When the BXU isin processor mode). Indicates which one of 
7 1. the two “ways” ina directory set had a cache hit. The line is intended 
to drive the SRAM address pins and.will remain stable throughout the 
length of a cache access. | i 
_CORRECT: (When the BXU is in memory mode). Is used by the BXU to 
tell an external ECC controller to correct the memory data as it flows 
onto the local bus. If this signal is not asserted, then the memory data 
‘}.. may flow directly onto the local bus with only error checking, but no 
correction. 


WY;/MEM — WAY: (Defined only when the BXU is in PROCESSOR mode). 
| Indicates if the access is for the cache or private memory half of the 

SRAM. The line is intended to drive the SRAM address lines directly 
and will remain stable throughout the length of a cache access. 
MEMORY/REGISTER REQUEST: (Defined only when the BXU is in 
MEMORY mode). This signal allows mapping some of the BXU’s 
register space out to the registers in an external controller. If the signal 
is high, the associated L-Bus request is a memory request; otherwise, 
the L-Bus request is to an external register on the board. 


WORDg: (Only defined when the BXU is in PROCESSOR mode). 

_ Provides the low order bit of the word address for the SRAM. Together | 
with WORD,, the two bits indicate which of the four words within an 
address line should be addressed. Because SRAM timing is critical, an 
external latch could be required. The signals change for each word of. 
data transferred. — | 
UNCORRECTABLE ECC: (Only defined when the BXU is in MEMORY 

_ mode). Is an input used by the external ECC logic to signal to the BXU 

_ that it has detected an uncorrectable memory error. 


WORD;: (Defined only when the BXU is in PROCESSOR mode). 
Provides the high order bit of the word address for the SRAM. 
Together with WORDp, the two bits indicate which of the four words 
within an address line should be addressed. Because SRAM timing is 
critical, an external latch will be required. The signals change for each 
word of data transferred. | | 
ECC ERROR: (Defined only when the BXU is in MEMORY mode). Is 
an input used by the external ECC logic to signal to the BXU that it has 
detected a memory error. The signal will be asserted even though 
external logic may be correcting the error and providing correct data 
on the L-Bus. If the BXU is asserting its CORRECT signal, the ECC 
ERROR signal will be ignored. Only the UNC pin will be checked for an 
error indication under these conditions. © 


SSBUSY SUBSYSTEM BUSY: Connects together all BXUs in a module that are 
in the same subsystem. When the signal is pulled low (BUSY), the 

BXUs will accept a request address, but will not continue with the data 
cycles. This signal is used to ensure that the BXUs always handle 
RMW-writes, Interagent Communication messages, and retries 
correctly. An external signal is needed because BXUs can generate 
AP-Bus requests internally because of the prefetcher, or their internal 
logic can be tied up handling an IAC request from the AP-Bus. 
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Table 11. 182965 BXU Module Support Signals (Continued) 


POP QUEUE: Is used by the two BXUs acting as bus backups for each 
other to communicate status on the completion of outstanding L-Bus 
requests. Usually, this signal is asserted when the oldest write in the 


queue has completed. During the partner ordering period, a different 
protocol is used to convey the status of all write requests outstanding. 


LOCAL ERROR REPORTING LINES: Are identical to the BERL 
signals defined for the AP-Bus, but are used on the module side to 
connect all BXUs on a single L-Bus. 


NOTES: 
I/O = Input/Output, | = Input, O = Output, O.D. = Open Drain 


MECHANICAL DATA Vcc and GND connections must be made to multi- 
| ple Vcc and GND pins. Each Vcc and GND pin must 


be connected to the appropriate voltage or ground 
Pin Assignment and externally strapped close to the package. Pref- 
| erably, the circuit board should include power and 
The MG82965 BXU (PGA package) pinout as ground planes for power distribution. Table 12 lists 
viewed from the top side of the component (pins the function of each pin. 
down) is shown in Figure 17 and from the bottom . | 
side (pins up) in Figure 18. Many of the signals are multiplexed and several sig- 
nals have different interpretations depending on 
whether the BXU is used in Processor or Memory 
mode. 
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Figure 17. MG82965 BXU Pinout—View from Top Side (Pins Down) ' 
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Figure 18. MG82965 BXU Pinout—View from Bottom Side (Pins Up) 
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Table 12. M82965 PGA Pinout—In Pin Order 
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Table 13. M82965 Pinout—In Signal Order 


|< 
9 
O 
SF 
o>) 


HKo 


Ww | > 


~* 

” 
08) 
=k 


< 
'?) 
on 


1a — — 


O 


N10 MODCHK. |} 


WY,/ 


{= 
m 
= 


— 3-306 


intel. 


Package Dimensions and Niounting 


The MG82965 BXU is packaged in either a 132-lead 
ceramic pin-grid array (PGA) or a 164-pin CQP pack- 
age. (Contact factory for details on CQP availability.) 
Pins in the PGA package are arranged 0.100 inch 
(2.54 mm) center-to-center, in a 14 by 14 matrix, 
three rows around. See Figure 19. 


A wide variety of available sockets allows low-inser- 
tion or zero-insertion force mountings, and a choice 
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of terminals such as soldertail, surface mount, or 
wire-wrap. Figure 20 shows several applcave sock- 
ets. 


Package Thermal Specification 


The M82965 BXU is specified for operation when its 
case temperature is within the range of —55°C to 
+125°C. The PGA case temperature should be 
measured at the center of the top surface opposite 
the pins as shown in Figure 21. 
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Figure 19. A 132-Lead Pin-Grid Array (PGA) Used to Package the MG82965 BXU 
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e Low insertion force (LIF) soldertail - 


55274-1 


e Amp tests indicate 50% reduction in in- , . 
sertion force compared to machined Amp LIF Socket 


sockets 
Other socket options 


55274-1 


e Zero insertion force (ZIF) soldertail 


55583-1 


e Zero insertion force (ZIF) Burn-in version 


55573-2 


Amp Incorporated 
(Harrisburg, PA 17105 U.S.A 
Phone 717-564-0100) 


’ Peel-A-Way* and Kapton Sock- 
et Terminal Carriers 


e Low insertion force surface 
mount CS132-37TG 


© Low insertion force soldertail 
CS132-01TG 


¢ Low insertion force wire-wrap 
CS132-02TG (two-level) 
CS132-03TG (three-level) 


¢ Low insertion force press-fit 
CS132-05TG 


Advanced Interconnections 
(5 Division Street) 
Warwick, Rl 02818 U.S.A. 
Phone 401-885-0485) 


Amp LIF Socket 


271082-21 


Cam handle locks in low profile position wnen MG82965 is installed . 


(handle UP for open and DOWN for closed positions). 
Courtesy Amp Incorporated - 


Peel-A-Way Carrier No. 132: [soupen tao1 | towprorne oa | pressrivs | 
Kapton Carrier is KS132 


Mylar Carrier is MS132 


Molded Plastic Body KS132 is 
shown below: 


FOOT PRINT NO. 132 


14x 14x 3ROWS 


271082-22 


271082-23 
Courtesy Advanced Interconnections 
(Peel-A-Way Terminal Carriers 
U.S. Patent No. 4442938) 


-*Peel-A-Way is a trademark of Advanced Interconnections. 


Figure 20. Several Socket Options for Mounting the M82965 BXU 
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MEASURE CASE TEMPERATURE 
AT CENTER OF TOP SURFACE 


132—PIN PGA 


271082-25 


Figure 21. Nieasuring ViG82965 Case Temperature 


ELECTRICAL SPECIFICATIONS 


Power and Grounding 


The M82965 is implemented in CHMOS III technolo- 
gy and has modest power requirements. Its high 
clock frequency and numerous output buffers (ad- 
dress/data, control, error, and arbitration signals) 
can cause power surges as multiple output buffers 
drive new signal levels simultaneously. For clean on- 
chip power distribution at high frequency, seven Vcc 
and thirteen Vss pins separately feed functional 
units of the M82965. 


Power and ground connections must be made to all 
Vcc and Vss pins of the M82965. On the circuit 
board, all Vcc pins must be strapped closely togeth- 
er, preferably on a Vcc plane. Likewise, all Vss pins 
should be strapped together, preferably on a ground 
plane. 


Power Decoupling Recommendations 


Liberal decoupling capacitance should be placed 
near the M82965. The BXU when driving its two 32- 


‘bit address/data buses (AP-Bus and L-Bus) can 


cause transient power surges, particularly when driv- 
ing large capacitive loads. 


Low inductance capacitors and interconnects are 
recommended for best high frequency electrical per- 
formance. Inductance can be reduced by shortening 
the board traces between the BXU and decoupling 
capacitors as much as possible. 


Connection Recommendations 


For reliable operation, always connect unused in- 
puts to an appropriate signal level. In particular, if 
PFETCH or LERLo_; are not used, they should be 
pulled up and if the CACHE input is not used (i.e., 
BXU operating in the Memory mode) it should be 
tied low through a 10 kQ resistor. No inputs should 
ever be left floating. 


All open-drain outputs require a pullup device. While 
in most cases a simple pullup resistor will be ade- 
quate, a network of pullup and pulldown resistors 
biased to a valid Vip (e.g., 3.5V) will limit noise and 
AC power consumption, especially on the AP-Bus. 
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ABSOLUTE MAXIMUM RATINGS* 


Case Temperature 


under Bias(1)........... —55°C to + 125°C Case’ 


Storage Temperature .......... —65°C to + 150°C - - 
Voltage on Any Pin.......... —0.5V to Voc + 0.5V 
Power Dissipation..............0000eeeeeeees 2.5W 


Operating Conditions 


age: 


Output Low Vo 


It 


lo. = 4mA LAD Lines 
lol = 5mA: Controls(2) 
lo. = 25mA:  L-Bus © 
| | ~ Open-Drain 
| Outputs 
lo. = 80mA: AP-Bus 
~  -Qpen-Drain 
Outputs 


Output High Voltage: 
lon = 1mA: ~~ LAD Lines | 
loy = 0.9mA:  Controls(2) 
loy = 5.0mA: ALE 


Input Capacitance 
1/O or Output Capacitance 


NOTES: 


M82965 


ADVANCE INFORMATION 


NOTICE: This data sheet contains information on 
products in the sampling and initial production phases 
of development. The specifications are subject to 
change without notice. Verify with your local Intel | 
Sales office that you have the latest data sheet be- 
fore finalizing a design. 


* WARNING: Stressing the device beyond the “Absolute 
Maximum Ratings” may cause permanent damage. 
These are stress ratings only. Operation beyond the 
“Operating Conditions” is not recommended and ex- 
tended exposure beyond the “Operating Conditions” 
may affect device reliability. 


1. Test frequency = 1 MHz, Tc = 25°C, unmeasured pins at GND. 


_ 2. “Controls” include all L-Bus I/O pins not otherwise specified. 
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A.C. SPECIFICATIONS 


This section describes the A.C. specifications for the 
M82965 pins. All input and output timings are speci- 
fied relative to the 1.5V level of the rising edge of 
CLK2, and refer to the time at which the signal 
reaches (for output delay and input setup) or leaves 
(for hold time) the TTL levels of LOW (0.8V) or HIGH 
(2.0V). 


All A.C. testing should be done with input voltages of 
0.45V and 2.4V. 


M82965 


ADVANCE INFORMATION 


Maximum output hold times are the same as mini- 
mum output delays. Tri-state signals have no resis- 
tive load or termination. 


The Output Delay specified for open-drain signals 
includes both the low to high and high to low tran- 
sitions. The float delay is the amount of time that the 
pulldown transistor may remain active. This specifi- 
cation is provided to help system designers calcu- 
late propagation delay for terminations other than 
the one used for testing. 
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Table 15. M82965 A.C. Timing Specifications (Over Specified Operating Conditions) 


Symbol Comments. 
v1 125 

Ts 1 

Ty | Clock Fall Time | 10 | 

Ts 10 


Output Valid Delay: 


Cy. = 100 pF 
CL = 125 pF 


LAD 
W 
CW, 


D, SS Busy 
CR 
Controls(1) 


FEF 
oO) 


Tr | ALE Width 
ALE Invalid Delay 


Tg 


Output Float Delay: 
LAD C. = 100 pF 
WY | C. = 125 pF 
Controls() | | CL = 75 pF 


Input Setup Time: 

LOCK, HOLD, HOLDAR, READY 10% Point 
ECC, UNG | 10% Point 
Controls() 10% Point 


T44 Input Data Hold 
T12 | Setup to ALE Inactive L = 100 pF (LAD) 

_ = 75 pF (Controls) 
T43 Hold after ALE Inactive C_ = 100 pF (LAD) 

| = 75 pF (Controls) 


T15 RESET Setup | 
T¢ RESET Width . 1250 


T47 Clock to Data Valid 17 | ns | C_ = 50pF 
(AP-Bus) lo. = 50 mA 
T418 Clock to High 
Impedance (AP-Bus) 


T49 Output Hold 
(AP-Bus) | 


T209 Input Setup (AP-Bus) 
To Input Hold (AP-Bus) 


NOTE: 
1. “Controls” include all L-Bus I/O pins not otherwise specified. 


iy 
= co 
oO 


“J 
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\ 

| 

| 
HIGH LEVEL (MIN) 0.55Vpe¢ ———1—, 


LOW LEVEL (MAX) 1.0V 


271082-26 


OUTPUTS: 
LAD3, LAD, 


__ READY, CR, RX XXXXXXXXXAKRAN 
BADAC, IAC; , IACy | 


nt OOOO as SB KXXXEKERS 
CERL,. LERL, 


INPUTS: 
LAD3;—LADg, : AVY 
CACHE, W/R, ho. AO 
BE3-BE,, 
LOCK, ADS 


CER, TERLy OO aay VAUD SAMP AT Oy XXXL ose AO SAP #2 Oy KKK 
271082-27 


*NOTE: 
LERL signals must be asserted at both edges Ao and Ag in order for them to be recognized by the BXU. 


Figure 23. Drive Levels and Measurement Points for A.C. Specifications. 
L-Bus Timings for the BXU as a Bus Slave 
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EDGE 


CLK2 


OUTPUTS: - ig | = 

LADz,~LADp, ; : : , 

ADS, BE3-BE9,5EN, XXXXXXXXKXAL aay VALID OUTPUT jay MX XKKKKXK 
CACHE, HOLDR, FRF, COR, MEM, 
W/R, LOCK, HLDA, HOLDR 


ALE 


VALID OUTpUT” 


T 


9 
CERL , TER OOOO oa aor | 


INPUTS: 
LAD3;—LADg, 
READY, ERR, 


LOCK, HOLD, 
HOLDAR 


| ) ; INPUTe 

| ye aan 
ce, tRy = XX E SE KXXXXKAXKKM Gav CKO 
es ; | ~-271082~-28 


*NOTE: 3 i 
LERL signals must be asserted at both edges Ap and Ag in order for them to be recognized by the BXU. 


Figure 24. Drive Levels and Measurement Points for A.C. Specifications. 
L-Bus Timings for the BXU as a Bus Master 
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Tq Ty Ty, 


—— 


La(31-0) KS} ) el Fee re = ae aaa SSw 


sis, i ew aes 


es pHs 


tapes eR 


w 
a ewe KS heat hex: NN ESS 


271082-29 


sevorngeeg 


Figure 25. Relative Timing for L-Bus Signals 


3-315 


intl, § wees 40 ADVANGE INFORMATION 


_ EDGE A B C D A 


B Cc . 
BUS CYCLE PHASE 1 PHASE 2 Zot 
OUTPUTS: 


D 
: ee r ) ., 
_ ,AD(31=0)» . 
es SXXKKKKKKKK 


COCCCCCCOCCCCS Sa 
_ RPYDEF | 
AA AKAN AAAAAAAAS 
XKXKKKXXK KAA KM Yor 
VAVAVAVAVAVAVAVAVAVAVAVAVAVAV, 
| XXX MAAALAL XKXKXX) 
INPUTS: 
AD(31=0): - 


SPEC(s_.0); 
~ _RPYDEF 


CHK (5-0) 
ARB(3~0) 


BERL(, -0) 


XXXXXXXXKKLXXXXKKMMXXXAKAA VAP 
SO 0050500, 


a2 2 - alas i a 
KXXKKRE EN wernt RRXXRKEREK 


* SEE NOTE 


/X\ 


eon 


XXXXXXXXXXXKKKN XXXXEXEX | 


BERL;125, YYY 
(1-0) XXX, 


XXKX 


Sey +f | 4. 271082-30 


*NOTE: 
BERL signals must be asserted at both edges Apo and Ag in order for them to be recognized by the BXU. 


Figure 26, Relative Timing for AP-Bus Signals 


OUTPUTS 


ALL COMPONENTS MUST 
INIT PARAMETERS (BADAC, . Ae ce Nie EDGE AS EDGE Az, 
IACo) MUST BE SETUP 8 CLOCKS : 2 

PRIOR TO THIS CLK2 EDGE 


INIT PARAMETERS MUST BE HELD Tys5 = RESET SETUP 
BEYOND THIS CLK2 EDGE Ty6 = RESET WIDTH 


271082-31 


Figure 27. RESET Setup and HOLD Timing ~ 
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L-BUS DESIGN CONSIDERATIONS 


Input hold times can be disregarded by the designer 
whenever the input is removed because of a subse- 
quent output from the BXU (e.g., DEN becomes 


deasserted). In other words, whenever the BXU gen- - 


erates an output that indicates a transition into a 
subsequent state, the BXU will have sampled any 
inputs for the previous state. 


As an example, in the recovery (T,) cycle following a 
read, the minimum time (tg win) that DEN becomes 
asserted is specified to be less than the minimum 
hold time on the data (t11 Min). When DEN is assert- 
ed, however, the data is guaranteed to have been 
sampled. 


Similarly, whenever the BXU generates an output | 


that indicates a transition to a subsequent state, any 
outputs that are specified to be tri-stated in this new 
state will be tri-stated. 


For example, in the data (Tg) cycle following an ad- 
dress (Ta) cycle for a read, the minimum output de- 
lay (tg Min) Of DEN is specified to be less than the 
maximum float time of LAD (tg yay). When DEN is 
asserted, however, the LAD outputs are guaranteed 
to have been tri-stated. 


AP-BUS SIGNAL TIMING 
CONSIDERATIONS | 


The AP-Bus uses three-quarter cycle signaling for 
data transmission. Data is driven on edge D and 
sampled on edge C. This approach allows three- 
quarters of the bus cycle to be used for data trans- 
mission. 


The remaining (one-quarter) time allows for clock © 


skew and signal hold time. All AP-Bus signals except 
for the ARB, CHK, and BERL signals use this timing. 
The relationship of the AP-Bus signals is shown in 
Figure 28. 


M82965 


ADVANCE INFORMATION 


The CHK signals (interlaced parity) are delayed by 
one-half cycle or one phase to allow for generation 
of parity from the internal data that is being transmit- 
ted. The CHK lines are sampled one phase after the 
data has been sampled and compared against the 
parity generated for the received data. 


Most input signals on the AP-Bus are sampled on 
the rising edge of CLK2 at edge C. The exceptions 
are the error signals CHK, BERL and ARB, which 
are sampled on the rising edge of CLK2 at edge A. 
Regardless of the edge, the setup and hold times 
are the same. 


All outputs on the AP-Bus are driven relative to the 
falling edge of CLK2 at the middle of phase 2, ex- 
cept CHK, BERL and ARB, which transition on the 
falling edge of CLK2 at the middle of phase 1. 


When designing a system based on the AP-Bus, the 
system topology will be limited by the available prop- 


agation time for signals in the system. The propaga- § 
tion time must allow for settling of ringing, ground §& 


shift, and crosstalk, all of which are dependent on & 
board and system materials and design. 


The following equation gives the propagation time 
available, given a specific clock implementation and 
frequency: 


| Tprop = 271 — (13 + Tg + Ts + (T1g OF T19) + Ty0a + Tskew) 


Where Tskew is the worst case clock skew between 
BXUs (clock skew is the time delay between any two 
clocks in the system due to physical distribution lim- 
its). 


In AP-Bus systems, this skew is defined as follows: 


T skew s T3 + T20 = 141 


L-Bus Waveforms 


Figures 30 through 36 illustrate the relationship of L- 
Bus signals during a variety of bus transactions. For 
a detailed discussion of the operation of the L-Bus, 
consult the 8O960MC Hardware Designer’s FRefer- 
ence Manual. 
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Figure 29. System and Processo 
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Figure 30. L-Bus Read Transaction 
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M82965 ADVANCE INFORMATION 
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Figure 31. L-Bus Write Transaction 
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Figure 32. L-Bus Burst Read Transaction 
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Tg : 


anana~e 


ADDRESS | ) naan onan: DATA 


od 
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> ll pp ss 
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Figure 33. L-Bus Burst Write Transaction with One Wait State 
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SECONDARY 


DELAY OF 5ns MINIMUM 


IS VIRED 
ore 271082-39 


Figure 34. Hold Timing — 
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PREVIOUS 
CYCLE ACKNOWLEDGEMENT (5 BUS STATES) ACKNOWLEDGEMENT 


agus QO _— SS Gi An NC TO ANG We CO 


| Pan Seem SSSI ADDR ice Hi ta 
se vy SISSIES N 


saases et 


or/R Ss nN esis a4 - 
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Figure 35. Interrupt Acknowledge Transaction 
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Figure 36. Bus Exchange Transaction (PBM = Primary Bus Master, SBM = Secondary Bus Master) 
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Memories and Peripherals 


85C960 
1-MICRON CHMOS 


80960 K-SERIES B 


Burst Logic, Ready Control, and 
Address Decode Support for 80960 


KA/KB Embedded Controllers in Single 


Chip 


Burst Logic Supports Both Standard 
and New Generation “Burst Mode” 
Memories and Peripherals 


Ready/Timing Control Supports 0-15 
Wait States across 8 Address Ranges, 
Read/Write Accesses, Burst 
Transactions 


8 Dedicated Inputs Decoded into 8 
Latched Chip Selects (4 External/ 
Internal; 4 Internal Only) 


*CHMOS is a patented technology of Intel Corporation. 


RESET/VPP C1 


28 (J Vcc 


7042 27 I ADs # 
i643 26 IJ DENY 
Ist 25 I W/R# 
405 2417) cSO# 
I3Ci6 23JcS1# 
12 C47 22 FI CS2# 
1Cy3 21 CI cs3# 
lo C49 20 [J ADO 
BLAST# 10 19/0) AD1 
RoY# C411 184) aD2 
WCLK# (1412 1717) AD3 
A3 (413 16fJA2 


15 fJCLK2 


 290192-1 
# = Active Low Signals 


US CONTROL pPLD 


© Operates with 80960KA/KB at 20 MHz 


and 25 MHz 
Icc = 50 mA Max. 


88 


Technology 


UV Erasable (CerDIP) or OTPT™ 
100%. Generically Testable Logic Array 
Based on Low Power CHMOS IIIE* 


1] Available in 28-Pin 300-mil CerDIP and 
PDIP Packages and in 28-Pin PLCC 


Package 


(See Packaging Spec., Order Number #231369) 


— [URESET/vpP 


no N85C960 


Figure 1. Pinout Diagram. 
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GENERAL DESCRIPTION 


The Intel 85C960 is a single-chip burst/ready/de- 
code »PLD (Microcomputer Programmable Logic 
Device) designed to interface 80960 KA/KB embed- 
ded controllers to system memory and I/O. The 
85C960 provides programmable chip selects, a pro- 
grammable read/write access wait state/ready gen- 
erator, and burst address (A2, A3) cycling. Burst 


transaction cycling of A2, A3, and WCLK# (Write — 


Clock) is also supported for intelligent peripherals on 
the bus. 


For its programmable functions, the 85C960 uses 
advanced EPROM cells as logic array and wait-state 
table memory elements. Coupled with Intel’s propri- 
etary CHMOS IIIE technology, the result is a pro- 


CHIP ca 
ee SELECT © 
: DECODER ee 


BURST 
CONTROL 


ADDRESS 
COUNTER 


BUS STATE 
TRACKER 


85C960 


grammable device able to support Intel’s 32-bit 
80960 KA/KB embedded controllers at speeds up to 
25 MHz. 


ARCHITECTURE DESCRIPTION 


The 85C960 uwPLD integrates burst control, ready 
generation, and chip select decoding into a single 
device. Figure 2 shows the architecture of the 
85C960. Table 1 lists and describes each signal on 
the device. The 85C960 replaces 6-10 separate 
PLD/discrete logic devices in small- and medium- 
sized 80960 systems. For medium- to large-sized 
systems, the 85C960 can be supplemented with an 
additional decoder, such as the 85C508, and a sec- 
ond 85C960. Figure 3 shows a single 85C960 ina 
typical application. 


CS4#=CS7# 
(INTERNAL) 


READY (OPEN DRAIN) . 
PT GENERATION <-> 


BLAST# 
WCLK# 


“"] PROGRAMMABLE RESOURCES 


290192-3 


Figure 2. 85C960 Block Diagram 
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i 


ADS 


NON=BURST MODE 


BURST MODE 


il 


MEMORY 


MEMORY 


CLK2 RESET 


290192-4 


oe) 
on 
7) 
fe) 
a 
ra) 


intel. | 85C960 


_ Table 1. 85C960 Pin Descriptions 


RESET RESET. When RESET is high for a minimum of four CLK2 cycles, internal 


circuits are reset to a known state. | 
I7-10 | INPUT 7-INPUT 0. These are the address range inputs to the 
programmable decode logic array. . | 
‘| CLK2 © 


SYSTEM CLOCK. This input, which connects to the 80960 CLK2 signal, 
AD3-—ADO 


—s provides the timing reference for all 85C960 operations. 

} ADDRESS IN 3-ADDRESS IN 0. These inputs are driven by LADO-LAD3 
wits 
Lan 


information. 


WRITE/READ. Write/Read from controller. When low, indicates that the 
current access is a read. When high, indicates that the current access is a 
write. 


DATA ENABLE. This input from the 
onthe L-Bus. | | 


ADDRESS/DATA STROBE. This input from the 80960 indicates whether 
address or data information is currently on the L-Bus. When low, address 

information is changing. The 85C960 chip select timing is based in part on 
ADS # low during Ta states. 


BURST LAST. This signal, when low, indicates that the current read/write 
access is the last access in a burst transaction. BLAST # is not cycled if 
RDY # is generated off-chip. | 


from the Local Bus (L-Bus) to provide addressing and burst access decode 
controller indicates that data is present 
ADS # 


BLAST# 

WCLK# 
AS, A2 

CS3#-CS0# 


WRITE CLOCK. This output provides a write enable strobe to memor 
do not support burst mode access. se 


ADDRESS OUT 3, 2. These outputs cycle during burst transactions. 
Typically connected to lowest memory address signals. 


CHIP SELECT 3-CHIP SELECT 0. Single p-term select outputs that are 
driven active (low) for the programmed address condition on 17-0. 


ies that 


READY. RDY # is an active low, bidirectional, open-drain signal that should 
be connected to the controller’s Ready input. As an output, RDY # goes high 
to cause the controller to extend the current access. RDY # goes low to 
indicate that the data on the L-Bus bus may be sampled (read) or removed 
(write). RDY # is controlled by the 85C960 Ready Generation and Wait-State 
Logic. The open-drain output allows RDY # to be OR-tied to other circuitry 

-that may drive the controller's Ready input. As a bidirectional input, RDY # 

allows the 85C960 to provide Ready timing and burst cycling for intelligent 

peripherals that do not generate these signals themselves. — | : 


\ 
. 
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80960 L-Bus (Local Bus) cycles are monitored by 


the Bus State Tracker to synchronize the functional 
blocks in the 85C960 to the L-Bus. CLK2 provides 
the timing reference for all 85C960 operations. 


Four external chip selects (CSO #-CS3#) are gen- 
erated by the programmable Chip Select Decoder. 
These four signals provide decoded selects to mem- 
ory and !/O devices and are routed to the program- 
mable Wait-State Table so that the 85C960 can 
generate RDY # at the appropriate time. Four addi- 
tional selects are decoded (internal only) and routed 
to the Wait-State Table so that the 85C960 can gen- 
erate RDY# for up to four additional address 
ranges. 


The Ready Generation block generates RDY # to 
the controller under control of the Wait-State Table. 
Depending on the contents programmed into this ta- 
ble and the current type of access, from 0-15 wait 
states can be introduced into each bus cycle. An 
independent wait state value can be chosen for 
each select and each access type. Four access 
types are possible: read first, read subsequent, write 
first, and write subsequent. — 


The Burst Control and Address Counter blocks 
control burst transaction timing to memory and 1|/O. 
Note that the RDY# pin is sampled by the Burst 
Control block to allow the 85C960 to generate burst 
‘transaction timing for other bus peripherals. WCLK # 
provides a write enable strobe for memory and |/O 


that do not support burst mode. BLAST # informs | 


burst-mode devices that the current access is the 
last one in a burst transaction. A2 and A3 are cycled 
to select the address location for each access. 


FUNCTIONAL DESCRIPTION 


The following paragraphs provide a detailed descrip- 
tion of each functional block in the 85C960 PLD. 


Chip Select Decoder 


The Chip Select Decoder, shown in Figure 4, is a 
high speed, single p-term (product-term) latched de- 
coder circuit with eight inputs (10-17) and eight 
latched outputs. Each output goes low when its as- 
sociated product term is true. Four of these outputs 
(CSO#-CS3#) are available externally to be used 
as device selects. The remaining four outputs 
(CS4#-CS7#) are available internally so that the 
85C960 can provide ready and burst timing for four 
more device selects. (The actual selects for these 
four additional devices/resources must be generat- 
ed by external logic.) | 


The input to each latch is a single NAND p-term that 
can be connected to the dedicated inputs. The true 


85C960 


and complements of all inputs (17-10) are available 
to all eight NAND p-terms. 


Each intersecting point in the logic array is connect- 
ed or not connected based on the value pro- 
grammed in the EPROM array. Initially (EPROM 
erased state), no connections exist between any 
p-term and any input. Connections can be made by 
programming the appropriate EPROM cells. Since 
p-terms are implemented as NANDs, a true condi- 
tion on a p-term drives the output low. Current con- 
sumption is higher when both true-and complement 
p-terms for the same input are programmed. 


Selects are latched on the falling edge of an internal 
Latch Enable (LE), which is generated from ADS#, 
DEN #, and CLK2. The proper combination of these 
signals occurs during an 80960 address state (Ta). 
Figure 5 shows the relationship of the internal LE 
and external chip selects to the three signals at the 
end of a Ta state. All selects are cleared to an inac- 
tive high state at the start of a recovery state. (Tr).. 
All eight selects (four external and four internal) are 
routed to the Wait-State Table. 


Wait State Table 


Chip selects, WR (Write/Read), and SW (Subse- 
quent Word) feed the Wait-State Table. Each chip 
select points to a set of four wait state values while 
WR and SW determine which of the four values to 
route to the Ready Generation block (see Figure 6). 
The four values are grouped into read and write 
groups with each group having a value for the first 
access and subsequent access (second through 
fourth). The four-bit wait-state value is sent to the 
Ready Generation block (via WSO #-WS3#) to be 
used as an initial count value. If two selects are ac- 
tive, the resulting count value is the logical bit AND 
of the two individual values. If more than two selects 
are active and the individual count values are not the 
same, the resulting count value is indeterminate. If 
no select is active, no count value is loaded (and the 
Ready Generation circuit is disabled). 


Ready Generation 


RDY # is high at the start of each burst transaction. 
The RDY Generator begins to count down from the 
wait state value, decrementing the counter at the 
start of each wait state. When the internal counter 
reaches 0000, RDY# is pulled low (CLK2c during 
the data state). On the next CLK2c edge (for a wait 
state), RDY # is released, allowing an external resis- 
tor to pull RDY # high. Figure 7 shows the timing for 
a four-word burst write transaction with 1 wait state 
for the first access and 0 wait states for the remain- 
ing three accesses (Burst Write 1-0-0-0). — 


intel. 


RDY # is an open-drain I/O pin, which must be con- 
nected to pullup and pulldown resistors as shown in 
Figure 8. During a wait-state access, RDY # is pulled 
high to cause the controller to extend the current 
access so that the memory or peripheral chip has 
time to present data to the bus (read), or sample 
data on the bus (write). RDY#. is released on the 
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- Figure 4. 85C960 Chip Select Decoder Block 
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CLK2a edge of a Tr state. If a Read or Write access 
occurs without a chip select having been decoded 
on-chip, the RDY# output buffer is disabled and 
RDY# is sampled as an input. This allows the 
85C960 to cycle A2, A3, and WCLK# to provide 
burst transaction timing for other bus. controllers. 
RDY # may be OR-tied with other bus controllers so 
they can access the processor Ready signal. 
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Latch opens when CLK2 and DEN# go high and ADS# goes low. 
Latch closes when DEN# goes low or ADS# or CLK2 go high. 


Figure 5. Internal LE and External Chip Select Timing 


Burst Transactions 


AD3, AD2 are latched to indicate the starting ad- 
dress of a burst transaction. The 85C960 places 
these two signals out on A3 and A2, respectively, 
then cycles the two addresses upward until the last 
access of the burst. The 85C960 assumes that the 
processor handles splitting of the burst transaction 
when a 16-byte boundary is crossed. 


ADO and AD1 specify the size of the burst transfer in 
double-words as shown in Table 2. 


Table 2. ADO-—AD1 vs Burst Size 


No. of 
aor | apo Words Transferred 
0 
1 
0 
1 


1 
2 
3 
4 


WCLK#, BLAST # Generation 


WCLK # is the write enable signal for writing to non-_ [4 


burst mode memories. When low, address outputs 
A2 and A3 are valid. Its trailing edge (low-to-high 
transition) can be used to latch data into non-burst 
mode memories. WCLK# is only provided during 
writes; during reads, WCLK# remains high. 


BLAST # indicates that the current access is the last 
access in a burst transaction. BLAST # is used by 
burst-mode memories to reset internal address 
counters. BLAST # is not cycled when RDY # is gen- 
erated off-chip. ; 


POWER-ON CHARACTERISTICS 
85C960 inputs and outputs begin responding 1 ps 


_ (max.) after Vcc power-up (Vcc = 4.75V) or after a 
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power-loss/power-up sequence. RESET must be 
synchronous to CLK2 and must be held high for a _ 
minimum of 4 clock cycles after Vcc reaches 4.75 V. 
After 4.clock cycles, A2 and A3 are high, CSO#- 
CS3# (and CS4#-CS7#), BLAST#, WCLK# are 
high, and the open drain RDY # signal is inactive. 


85C960 


LATCH-UP IMMUNITY 


_ Write/Read : 


WR =0 WR = 1 
(Read) | (Write) |. 


Vb: SW=0 msb Isb | msb Isb 

| (First Word) 0000 0000 

SW = 1 -msb Isb msb Isb 
0011 0010 


Select 
CSOf # 


(Subsequent Word) 


msb = most significant bit 
Isb = least significant bit 


Figure 6. Example Wait-State Entries for CSOf# 


- ERASURE CHARACTERISTICS 


Erasure time for the 85C960 is 20 minutes at 


12,000 pWsec/cm2 with a 2537A UV lamp. 


Erasure characteristics of the device are such that 


erasure begins to occur upon exposure to light with 
wavelengths shorter than approximately 4000A. It 
should be noted that sunlight and certain types of 
fluorescent lamps have wavelengths in the 3000A- 


All of the input, output, and clock pins of the device 
have been designed to resist latch-up which is inher- | 
ent in inferior CMOS processes. The 85C960 is de- 


‘signed with Intel’s proprietary 1-micron’ CHMOS 


EPROM process. Thus, each of the pins will not ex- 
perience latch-up with currents up to +100 mA and 
voltages ranging from —0.5V to (Vcc + 0.5V). The 
programming pin is designed to resist latch-up to the 
13.5V max. device limit. | 


DESIGN RECOMMENDATIONS 


4000A range. Data shows that constant exposure to 


room level fluorescent lighting could erase the typi- 


cal 85C960 in approximately two years, while it. 


would take approximately. two weeks to erase the 
device when exposed to direct sunlight. If the device 
is to be exposed to these lighting conditions for ex- 
tended periods of time, conductive opaque labels 
should be placed over the device window to prevent 
unintentional erasure. 3.” 


The recommended erasure procedure for the 
85C960 is exposure to shortwave ultraviolet light 
with a wavelength of 2537A. The integrated dose 
(i.e., UV intensity x exposure time) for erasure 
should be a minimum of fifteen (15) Wsec/cm2. The 
erasure time with this dosage is approximately 20 
minutes using an ultraviolet lamp with a 12,000 ppW/ 
cm2 power rating. The device should be placed with- 


For proper operation, it is recommended that all in- 
put and output pins be constrained to the voltage 
range GND < (Vin or Vout) < Vcc. All unused in- 
puts should be tied high or low to minimize power 
consumption (do not leave. them floating). Unused 
outputs may be left floating. A high-speed ceramic 
decoupling capacitor of at least 0.2 uF must be con- 
nected directly between the Vcc and GND pin. 


As with all CMOS devices, ESD handling procedures 
should be used with the 85C960 to prevent damage 
to the device during programming, assembly, and 
test. ‘ | 


FUNCTIONAL TESTING 


_ Since the programmable sections of the 85C960 are 


in 1 inch of the lamp tubes during exposure. The . 


maximum integrated dose the 85C960 can be ex- 
posed to without damage is 7258 Wsec/cmé2 (1 
week at 12,000 uW/cm2). Exposure to high intensity 
UV light for longer periods may cause permanent 
damage to the device. 
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controlled by EPROM elements, the device is com- 
pletely testable during the manufacturing process. 
Each programmable EPROM bit controlling the in- 
ternal logic is tested using application independent 
test patterns. EPROM cells in the device are 100% 
tested for programming and erasure. After testing, 
the devices are erased before shipments to the cus- 
tomers. No post-programming tests of the EPROM 
array are required. 


The testability and reliability of EPROM-based pro- | 
grammable logic devices is an important feature 
over similar devices based on fuse technology. 
Fuse-based programmable logic devices require a 
user to perform post-programming tests to insure 
device functionality. During the manufacturing pro- 


cess, tests on fuse-based parts can only be per- 


formed in very restricted ways in order to avoid pre-. 
programming the array. 


vf 


85C960 


“pan npn 


ADO-ADS ones ee aan ea Senco eves He Sma tem 
aa ae a SE 


(0-17 = op 


Sa 
Se 


290192-7 


Figure 7. Burst Write Transaction (1-0-0-0) 


4-9 


85C960 
OPEN~DRAIN 
‘ OUTPUT 


290192~8 


lo. = 28.8 mA 
VoH = 3.0V 


Figure 8. RDY # Pullup/Pulldown Resistors 


IN-CIRCUIT RECONFIGURATION 


The 85C960 allows in-circuit configuration changes 
after the device has powered up. At power-up, the 
device is configured according to the information 
_ programmed into the EPROM cells. After power-up, 
new information can be shifted in on select pins to 
alter device configuration. The new configuration is 
retained until the device is powered down or until the 
information is overwritten by another configuration 
change. ) 


ORDERING INFORMATION 


*Only windowed CERDIP allows UV-erase. 


80960KA/KB | 2 
Clock Frequency _ PLD Order Code | , 
*D85C960-20 © CERDIP os 
20 MHz 7 Commercial 
N85C960-20 PLCC 
| *D85C960-25 CERDIP | 
25 MHz 3 Commercial | 
~  N85C960-25 — PLCC | 


85C960 


\ 


Note that in-circuit configuration changes allow “on- 
the-fly’ changes to be made, but do not alter 
EPROM cell data. At the next power-up, the device 
will be configured according to the original data pro- 
grammed into the EPROM cells. In-circuit reconfigu- 
ration requires additional circuitry external to the 
85C960. For details on in-circuit configuration 
changes, refer to AP-337, /n-Circuit Reconfiguration 
of 85C960 and 85C508 yPLDs, order number 
292072. , 


DESIGN SOFTWARE 


Software support is provided by version 2.1 (or later) - 
of iPLS II (Intel Programmable Logic Software Il). 
Programming is supported on the iUP-PC PC-based » 
programmer or iUP-200A/201A Universal Program- 
mer via the GUPI base module and the GUPI 


85EPLD28 programming adaptor. 


For detailed information on iPLS Il, refer to the 
iPLDS I|.Data Sheet, order number: 290134. The 
tools section of the Programmable Logic handbook 
contains a complete listing of all design tools for In- 
tel EPLDs. 
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ABSOLUTE MAXIMUM RATINGS* 


() earner rere = 2: aT: 

SuPPY vollage (Vcc) 6 eye ae *WARNING: Stressing the device beyond the “Absolute 
Programming Supply Maximum Ratings’”’ may cause permanent damage. 

Voltage (Vpp)()...........64. —2.0V to + 13.5V These are stress ratings only. Operation beyond the 
D.C. Input Voltage (V))(1, 2)... -0.5V to Vog + 0.5V “Operating Conditions” is not recommended and ex- 

js , tended exposure beyond the “Operating Conditions”’ 

Storage Temperature (Tgtg) ..... —65°C to + 150°C may affect device reliability. 
Ambient Temperature (Ta)(3) ..... —10°C to + 85°C 
NOTES: 


1. Voltages with respect to GND. 

2. Minimum D.C. input is —0.5V. During transitions, the in- 
puts may undershoot to —2.0V or overshoot to +7.0V for 
periods of less than 20 ns under no load conditions. 

3. Under bias. Extended Temperature versions are also 
available. 


RECOMMENDED OPERATING CONDITIONS 


Ww tnputvotage doe 
i 
TA 


| Vo] _—_Output Voltage 
— eae Operating Temperature 


intel. | 56960 


Symbol Parameter 
=a 


Vin1(4) | High Level Input Voltage ~ 
(Alllnputs except for = 
ADS #, ADO-AD3, DEN#, 
and W/R#) 
ver High Level Input Voltage 
for ADS #, ADO-AD3, 
DEN #, and W/R# 
Low Level Input Voltage 
High Level Output Voltage 2.4 lon = —4.0 mA DQC., 
Voc = Min. 
Low Level Output Voltage 
VoL2 Low Level oe Voltage 
for A2, A3 
VoL3 Low Level Output Voltage 
for Open Drain (RDY #) 
= Input Leakage Current 
| Output Eoakage Current 
- ND < eur < Voc 


Isc) Output Short Circuit Current | —3 Voc = Max., Vout = 0.5V — 


wae Power oueee Current Voc = Max., Vin = Voc or GND, 


No Load, CLK2 = .50 MHz 
NOTES: 


4. Absolute values with respect to device GND; all over and undershoots due to system or tester noise are included. 
5. Not more than 1 output should be tested at a time. Duration of that test should not exceed 1 second. 


lop = 4.0 mA D.C., Voc = Min., 


C, = 30 pF 
lo = = 24mAD.C., Voc = Min., 


lon = 30 mA D.C., Voc = Min., 
CL = 


© . 
-) © 
\ 


. : A.C. TESTING LOAD CIRCUIT 
A.C. TESTING LOAD CIRCUIT (RDY #) (ALL OUTPUTS EXCEPT RDY #) 


B5C960 © 
OUTPUT 


RDY# 
OUTPUT 


290192~-9 . 
290192-18 


See D.C. Characteristics Table for Current and Capaci- 
tance Specifications. 

D1 and D2 are matched 

D3 and D4 are matched 


See D.C. Characteristics Table for Current and Capaci- 
tance Specifications. 
‘(Di and D2 are matched. 
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A.C. TESTING WAVEFORM—SYNCHRONOUS INPUTS AND OUTPUTS 


INPUT (SETUP 
AND HOLD) 


OUTPUTS 


290192-10 
A.C. Testing: Inputs are driven at 2.4V for a Logic “1” and 0.4V for a Logic “0”. CLK2 is driven at 3.0V for a Logic ‘1” 
and 0.45V for a Logic ‘0’. Timing Measurements made relative to CLK2 are made from 1.5V on CLK2. Inputs and 


outputs are measured at 2.0V for a high and 0.8V for a low. Device input rise and fall times are less than 3 ns. 


INPUTS 


OUTPUTS 


. 290192-11 
A.C. Testing: Inputs are driven at 2.4V for a Logic “1” and 0.4V for a Logic ‘‘O’’. Input timing is measured at 1.5V for 
high-to-low and low-to-high transitions. Outputs are measured at 2.0V for a high and 0.8V for a low. Device input rise and 
fall times are less than 3 ns. 7 
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A.C. CHARACTERISTICS (Ta, = 0°C to + 70°C, Vocg = 5.0V +5%) 


85C960-25 
Parameter 


Input Setup to CLK2a 


Input Hold from CLK2a _ 


ty CLK2c to RDY # Output Low Delay 
CLK2c to RDY # Output High Delay 


CLK2a to CSO # -CS3# High Delay 


Cin 
= 
= 
a 

Ts 


CLK2a to BLAST # High Delay 


5 


—_ 
io) 


—_ 
NO 


7 


l tie | 
RESET Low Setup to CLK2a 


NOTES: : : | OTe a oe. a: ote : we te ase 3 

6. Applies to ADS#, DEN#, W/R#, and ADO—ADS3. DEN # ts high during-the entire Ta state in 80960 KA/KB systems. 

-7. RDY# is an open-drain output. Specified time includes RDY# output float delay and pull-up/pull-down resistors 
_ (Figure 8). RDY # remains low for a minimum of 10 ns at the start of a Tr state and goes high by CLK2a of the next Tx state. 
8. Minimum WCLK # pulse width is one clock period minus 3 ns. For example, at 25 MHz: 20 ns — 3 ns = a 17 ns minimum 
WCLK # pulse. = ; 

9. Chip Select Decoder latches are transparent flow-through types. Latches open when ADS # is low, DEN# is high, and 
CLK2 goes high during the middle of a Tx state (CLK2c). Since DEN# is high during the entire Ta state in 80960 KA/KB | 
systems, only CLK2c and ADS# are specified. | 

10. Chip Select Decoder latches are transparent flow-through types. Latches close when ADS # is high or DEN # is low, or 
when CLK2 goes high at the start of a Tx state (CLK2a) after the latches have opened. Since ADS# is low and DEN# is 
high at the end of a Ta in 80960 KA/KB systems, setup and hold times are specified with reference to CLK2a only. 

11. Propagation delay while latches are open (transparent); one output switching (high-to-low). 

12. RESET must be held high for a minimum of 4 CLK2 cycles (80960 specifies 41 CLK2 cycles minimum). 

13. RESET must hold after the low-to-high transition immediately prior to CLK2a. CLK2a is defined as the first low-to-high 
transition after RESET goes low. . 


aa 
pa 
x 
Lo 
0 | 
ae 
SE Tae 
isl") | 10-17 Valid to CS0#-CS3#Valid Delay-tteo) | 
oe 

a 

eee 

=e 


12 
2 
5 
= 5 | 
0-17 Hold from CLK2a ss 
5 
3 
5 


RDY # Input Setup to CLK2d (Write) 
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CLK2 EDGES 


290192-12 


NOTE: 
Minimum CLK2 high and low times are 8 ns measured from 1.5V to 1.5V. 


CAPACITANCE (Ta = 0°C to + 70°C; Voc = 5.0V + 5%) 


[symbol | Parameter | win | Typ _ 
Ten | trputGapactence || 8 
Tour | Output Capacitance || 6 
re ee 
C10 

ees Bee 


Cunt | Conaitions 


CLK2 Capacitance 
Vpp Pin Capacitance 
RDY # Capacitance 


‘ 


Vpp on Pin 1 (RESET) 
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9l-v 


_4 Word Burst Write with 1 Wait State on Each Access 
RDY # is Generated by the 85C960 
pet ease for Read a it Except wees bali High) 


: Nepeeenn. Ye 


| a 22 ee eee 


ADORNS Ve 


. oe 
CS0-CS3 San ea 

A2,A3 aes oo SS 
™ aa = 
“ ned at ie dll hs 


BLAST 


290192-13 


0969058 
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WCLK# TIMING 


290192-14 


10-17 AND CS0#-CS3# TIMING 


290192-15 


NOTE: 
CLK2, ADS#, and DEN# generate internal latch enable. See Figure 7 for details. 


iniel. | 85C960 


3 Word Burst with 0 Wait States on Each Access 
RDY # is Generated Externally 


(WCLK# is Only Generated During Burst Write Transactions) 
Ta | 


290192-16 


RESET INPUT TIMING 


4 CLK2 CYCLES 
(MINIMUM) 


290192-17 
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| 27960CX 
PIPELINED BURST ACCESS iM (128K x 8) CHMOS EPROM 


Synchronous 4 Byte Data Bursi Access 


Ej 


No Glue Interface to 80960CA 


[1 High Performance Clock to Data Out 
— Zero Wait State Data to Data Burst 
— Up to 33 MHz 80960CA Performance 


tf Asynch Microcontroller Reset Function 
— Returns to Known State with High-Z 
Outputs 


Pipelined Addressing for Optimal Bus 
Bandwidth on 80960CA 
— Next Addressing Overlaps Last Data 

Byte 

CHMOS IlI-E for High Performance and 
Low Power 
— 125 mA Active, 30 mA Standby 
— TTL Compatible Inputs 


1 Mbit Density Configures as 128K x 8 


Intel’s 27960CX is a 5V only, 1,048,576 bit, Erasable iateennerrs Read Only Memory, organized as 128K 


words of 8 bits. 


The 27960CX provides a no glue synchronous burst interface to the B0960CA bus. eternal the 27960CX is 
organized in 4 byte blocks, in which each byte is accessed sequentially. The internal state machine is factory 
configured to generate either 1 or 2 wait-states between the address and first data byte. High performance 
outputs provide zero wait-state data to data accesses at clock frequencies up to 33 MHz. 


Pipelining capability allows addresses to overlap previous data, further optimizing bus bandwidth in 80960CA 


applications. An asynchronous microcontroller RESET feature puts the outputs in the high impedance state 
and takes the internal state machine to a known state where a new burst access can begin. 


The 27960CX is available in 44-lead PLCC package, providing optimum cost effectiveness. 


The 27960CX is manufactured on Intel’s 1 micron CHMOS II-E technology. The Quick-Pulse Programming™ 
algorithm provides fast, reliable programming with throughput under 17 seconds for epumized equipment. 


*CHMOS is a Patented Process of Intel Corporation. 


Figure 1. 27960CX Burst EPROM Block Diagram 


yt. X PREDECODER 
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290236-1 


September 1991 
Order Number: 290236-006 
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27960CX BURST EPROM 


EPROMs are established as the preferred code stor- 
age device in embedded applications. The non-vola- 
tile, flexible, reliable, cost effective EPROM makes a 
product easier to design, manufacture and service. 
Until recently, however, EPROMs could not match 
the performance needs of high-end systems. The 
27960CX was designed to support the 80960CA em- 
bedded processor. It utilizes the burst interface to 
offer near zero wait-state performance without the 
high cost normally associated with this performance. 


In embedded designs, board space and cost must 
be kept at a minimum without impacting perform- 
ance and reliability. The 27960CX removes the need 
for expensive high-speed shadow RAM backed up 
by slow EPROM or ROM for non-volatile code stor- 
age. Code optimization concerns are reduced with 
' “off-chip” code fetches no longer crippling to. sys- 
tem performance. FONTs can be run directly out of 
these EPROMs at the same performance as high- 
speed DRAMs. With the 27960CX, the EPROM is 
the ideal code or FONT storage device for your 
80960CA system. po NS 


*CERQUAD is available in a socket only version. 


cS 


ADDRESS 


27960CX 


Figure 2. 27960CX Burst EPROM Signal Set 
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Architecture | 


The 27960CX provides a no-glue, synchronous burst 
interface to the 80960CA’s bus. It operates in pipe- 
lined or non-pipelined modes. internally, the 
27960CX is organized in 4 byte blocks which are 
accessed sequentially. A burst access begins on the 
first clock pulse after ADS and CS are asserted. The 
address of the 4 byte’ block is latched on the rising 
edge of clock following ADS. After a preset number 
of wait-states (1 or 2), data is output one byte at a — 
time on each subsequent clock cycle. A burst ac- | 
cess is terminated on the rising edge of clock with 
BLAST asserted. High performance outputs provide 
zero wait-state data to data accesses at clock fre- 
quencies up to 33 MHz. Extra power and ground 
pins dedicated to the outputs reduce the effects of 
fast output switching on device performance. 


The pipelining capability of the 27960CX allows the 
address to overlap the last data byte of the burst, 
further optimizing bus band width in 80960CA appli- 
cations. In the pipelined mode, with a non-buffered 
interface, the 27960CX delivers 4 bytes of data in 
6 clock cycles at 33 MHz. In a 32-bit configuration, 


’ this translates into a read bandwidth of 88 Mbytes/ 


sec. Performance capability of the 27960CX in dif- 


ferent 80960CA systems is given in Table I. 


27960CX . 
BURST 
EPROM 


128K x 8 


290236-2 
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Table 1. Performance Capability 
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33. MHz 2WS_ Non-Buffered: 4 Words/6 Clock Cycles —> 88 Mbytes/Sec 


WSrile ets ne) ae i NS ge Ae 
Doo | Doi | Do2 | Dos | — Dio 
Gye I Se. Ill SG! I. 


D44 


C4 


2WS _ Buffered: 4 Words/6 Clock Cycles —> 66 Mbytes/Sec — 


Sepe o' oceeees, SP Ree (glee: | AWS? AS hee 


Doo | Doi | Do2 | Dos | — — | Dio 


Gy WGe. 1 2G Wie) eet Gal C, 


D114 
orm 


1 WS Buffered: 4 Words/5 Clock Cycles —> 51 Mbytes/Sec 


— Aoi WS — — — 


Do2 | Dos | — | Dio | Dai | Diz 


Cy Co C3 C4 


- N27960CX 


44 LEAD PLCC 


0.650" x 0.650" 
TOP VIEW 


Figure 3. 27960CX 44 Lead PLCC Pinout 
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PIN DESCRIPTIONS 
Function 


ADDRESS INPUTS: During a burst operation, Ao—A46 provides the 
base address pointing to a block of four consective bytes. Ag and A; 
select the first byte of the burst access. The 27960CX latches 
addresses in the first clock cycle. An internal address generator — 
‘increments addresses Ap and A, for subsequent bytes of the burst. 


18, 17, 14, DATA INPUTS/OUTPUTS 
13,11, 10, 
7,6 


ADDRESS STROBE: Indicates the start of a new bus access. ADS is 
~ active low in the first clock cycle of a bus access. 


CHIP SELECT: Master device enable. When asserted (active low) 
data can be written to and read from the device. In read mode, cs 
enables the state machine and the I/O circuitry. 

NOTE: 
1. The address decode path is independent of CS, i.e., X and Y 
decoding is always powered up. 
2. For programming, CS should remain low for the entire cycle. 
Program and verify functions are done one byte at a time. 
3. CS going high does not terminate a concurrent burst cycle. 


_BURST LAST: Terminates a concurrent burst data cycle at the rising 
edge of the CLK. It must be asserted by-the fourth data byte. 


RESET: Resets the state machine into a known state, tri-states the 

- outputs. RESET must be asserted for a minimum of 10 clock cycles. At 
least 5 clock cycles are required after deassertion of RESET before 
beginning the next cycle. RESET will abort a concurrent bus cycle. 


- PROGRAM-PULSE CONTROL INPUT | 


| _ PROGRAMMING POWER SUPPLY 
15,1921 | 


9,16,20,44 | SUPPLYVOLTAGEINPUT | | 
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INTERFACE EXAMPLE 


Overview 


This example illustrates 8-, 16- and 32-bit wide 
27960CX interfaces to the 80960CA. The designs 
offer a simple ‘‘no-glue”’ interface. 


A non-buffered 27960CX system organized as 256K 
x 32 is shown in Figure 4A. Since the 27960CX is 
capable of driving a 80 pF load, large, non-buffered 
systems can be implemented by stacking up to 2 
banks of 4 EPROMs, resulting in a 256K x 32 memo- 
ry subsystem. The input capacitive load seen 


Lj} $a} 


ADDRESS 


80960CA 


ADDRESS 
DRIVER 


80960CA 


27960CX 


PRELIMINARY 


on the address lines (due to the EPROM only) is 
24 pF for a 128K x 32 system and 48 pF for a 256K x 
32 system. The EPROM is specified at 6 pF for input 


_ capacitance (15 pF max) and 12 pF typical for out- 


put capacitance. Larger systems can be implement- 
ed with buffers (Figure 4B). 


Chip Select Logic 


High order address lines are decoded to provide CS. 
Qualification with other signals is not required. The 
chip select logic can be implemented with standard 
asynchronous decoders, PAL’s or PLD’s (like Intel’s 
85C508). 


290236-4 


290236-—5 


Figure 4B. Buffered Burst EPROM Memory System 
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Schematics 


Figure 5 shows a non-buffered, 128K x 32 27960CX 
EPROM system. 


~ Chip select logic, the only external logic that is re-. 


quired for this interface, can be derived from the 
global system chip select circuitry. 


\ 
DECODER 
(85C508) 


27960CX 
128Kx8 


80960CA 


DECODER 
(85C508) 


ADDRESS 


ee 


80960CA 


27960CX 


cs 
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In a non-buffered, 16-bit system (Figure 6A) BE7 
and Az connect to the lower order address bits of 
the 27960CX. BE1 connects to Ag of both EPROMs, 
while A> connects to both Aj’s. | 


In a non-buffered, 8-bit system (Figure 6B) BEO and 


BE1 connect to Ag and A; respectively. — 


27960CX 
128Kx8 


27960CX 
128Kx8 


290236-6 


cs 


cs 
A2-Aig 


27960CX 
128K x 8 


27960CX 
128K x 8 
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_ Figure 6A. 27960CX Burst EPROM ina 16-Bit System 
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| pecover | 
(85C508) 


ADDRESS 


= 


— ADS 


LePORK. ~~ eg 


BLAST 
80960CA 


Cs 
A2-At6 


27960CX } 
128K x 8 § 


t >a 


CLK 
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Figure 6B. 27960CX Burst EPROM in a 8-Bit System 


Waveforms 


Figure 7 shows the timing waveforms of a 27960CX 
pipelined read in a 32-bit system. 


CS Setup Time 


CS setup time is the time between CS being assert- 
ed and the first CLK rising edge (during the address 
cycle). Since a memory access begins on the first 
CLK rising edge after ADS and CS are asserted, a 
minimum CS setup time of 7 ns (tsycH) at 33 MHz is 
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required. With the 80960CA’s maximum valid ad- 
dress delay of 14 ns at 33 MHz, 9 ns remains for CS 
decoding logic. 


Bootup — 


The wait state configuration (1 or 2), of the 27960CX 
is programmed by the user into the 80960CA Region 
Table parameters of NRAD, NRDD, and NXDA. 
NRDD is always 0 for the 27960CX. 


intel o | 27960CX 7 | PRELIMINARY 


A ws ws D D D A/D ws WS D D D A/D 
0 1 2 3 4 5 6 7 8 9 “10 11 12 13 
27960CX DOES NOT — THIS INFORMATION 
NRAD = 2 
NRDD = 0 
NXDA = 0 


ibe PIPELINED BURST READ 
nan 
NOTES: 


1. The EPROM can also operate in non pipelined mode i.e, next address and ADS can be asserted in the clock cycle 
following the last data word of the burst. 
2.2-0- Os O Burst Read — 2 indicates the number of wait states to: access the first word 

0’s indicate the number of wait states for subsequent data words: 

0 in this case! 


H00006 
“8 pape | 
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Figure 7. Two Cycles of a 27960CX 2 Wait State 4 Byte Read (2-0-0-0 Burst Read) in a 32 Bit System | 


During boot-up (Figure 8), the 80960CA picks up it’s always assumed. On timings where the EPROM is 
Region Table data from addresses FFFF FFOO; faster than the microprocessor, we specified the 
FFFF FFO4; FFFF FFO8 and FFFF FFOC. Only the time required by the EPROM and left the excess 
least significant byte of each of the above four 32-bit time as additional system guardband. The example 
accesses is used to configure the Region Table. For below shows how the 27960C2-33 tavcoh timing 
boot-up, the wait-state parameters NRAD and NXDA was Gore: 

default to 31 and 3 respectively. During boot-up, the 


27960CX will wrap around the first word of the four- @33 MHz the clock cycle is ~ 30 ns. 
sitar and hold the first word until BLAST is tove of the 80960CA is 3ns.— 14 ns. 


Typical 2 ns guardband. 


30 ns — 14ns — 2ns 
14 ns 


27960CX DEVICE NAMES 27960C2-33 tavcgh 


The device names on the 27960CX were derived as 
mnemonics that correspond to the number of wait 
states and expected operating frequency for the de- 
vice. For example, the 25 MHz, 2 wait state 
27960CX is named 27960C2-25. 


Decoders are needed for the systems chip select 
decoding. For the 27960CX timings we assumed a 
10 ns chip select decoder for 16 MHz and a 7 ns 
decoder for 25 MHz and 33 MHz systems. The ex- 
ample below shows how the 27960C2-33 tsvch tim- 


AC TIMING DERIVATIONS | ing was derived. 

The AC timings for the 27960CX were generated @33 MHz the clock cycle is ~ 30 ns. 
specifically to meet the requirements of the t of the 80960CA is 3ns — 14 ns. 
80960CA microprocessor. In each case the applica- . Ove a on 

_ble 80960CA clock frequency and AC timing were Decoder = 7 ns 


taken together with an address buffer delay (if need- 
ed) and a typical 2 ns guardband to generate the 
27960CX AC timing. Worst case timings were 


27960C2-33 tsvch = 30 ns — 14ns — 7ns 
9ns 
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BINARY SEQUENCE FROM Ag TO Ay 
. __-290236-12 
R=1kQ CS=GND 
GND =0V_ CLK = 1 MHz 


' Vpp = +5V 
Voc = +5V 
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Figure 9. 27960CX Burn in Biasing Diagram 


‘System Buffering Considerations 


For large system applications buffering may be re- 
quired between the microprocessor and memory de- 
vices. The 25 and 16 MHz 27960CX AC timings take 
this into account. For applications not requiring buff- 


Note that the 25 MHz buffers are slightly faster in 
keeping with the increased sensitivity for higher per- 
formance. Significantly faster buffers are available 
for applications requiring them. The example below 


shows the tchqv timing analysis for a buffered 


‘ering these devices will provide additional system . 


guardband. . . 
The list below shows the buffers used in generating 
_ the 27960CX timings: 
Input Output 
Buffer Buffer 
25 MHz 8 ns 5ns 
16 MHz 7ns 


10 ns 
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27960C2-25. 


@25 MHz the clock cycle is ~ 40 ns. 
tiy1 Of the 80960CA is 5 ns. 
Output buffer for 25 MHz = 5 ns 


27960C2-25 tcHaqy = 40 ns — 5ns — 5ns 
= 30 ns 
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ABSOLUTE MAXIMUM RATINGS* 


Read Operating Temperature...... 0°C to + 70°C(8) 
Case Temperature Under Bias... — 10°C to + 80°C(8) 
Storage Temperature ....... ++. —65°C to + 125°C 
All Input or Output Voltages 

with Respect to Ground...... —0.6V to +6.5V(4) 
Voltage on Ag 

with Respect to Ground..... —0.6V to + 13.0V(4) 
Vpp Supply Voltage 

with Respect to Ground..... —0.6V to + 14.0V(4) 
Voc Supply Voltage 

with Respect to Ground...... —0.6V to + 7.0V(4) 


READ OPERATION 


27960CX 
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NOTICE: This data sheet contains preliminary infor- 
mation on new products in production. The specifica- 


tions are subject to change without notice. Verify with 
your local Intel Sales office that you have the latest 
data sheet before finalizing a design. 


*WARNING: Stressing the device beyond the “Absolute 
Maximum Ratings’ may cause permanent damage. 
These are stress ratings only. Operation beyond the 
“Operating Conditions” is not recommended and ex- 
tended exposure beyond the “Operating Conditions” 
may affect device reliability. | 7 


DC CHARACTERISTICS 0°c < Ta +70°C, Voc = 5V +10%, TTL Inputs 


loc Vcc Active Current . — 1,3,7 


Input Low Voltage | 


VOH Output High Voltage 


los Output Short Circuit 


NOTES: 
1. Maximum current is with outputs unloaded. 


= Vin, f = 33.MHz 


2. loc standby current assumes no output loading i.e., loy = lo, = O mA. 
3. Ioc is. the sum of current through Voc3 + Vcoc4 and does not include the current through Vcc; and Voce. (Vcc and 
Vcce supply power to the output drivers. Voc3 and Vcc4 supply power to the reset of the device.) 

4. Minimum DC input voltage on input and output pins is —0.5V. During transitions, this level may undershoot to —2.0V for 


periods less than 20 ns. 


5. Maximum DC voltage on input and output pins is Vcc + 0.5V which may overshoot to Vcc + 2.0V for periods less than 


20 ns. 


6. One output shorted for no more than one‘second. los is sampled but not 100% tested. 
7. Ic¢¢ max measured with a 10.11 uF capacitor between Vcc and Vss. | 
8. This specification defines commercial product operating temperatures. 
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EXPLANATION OF AC SYMBOLS | The fifth character represents the signal level indi- 
| OS cated for the fourth character. The list below shows 
The nomenclature used for timing parameters are as character representations. ae 


per IEEE STD 662-1980 IEEE Standard Terminology ae s.. 
Reset 


for Semiconductor Memory. | : A: Address R: 

a i | ; 3 B: BLAST Q: Data 
Each timing symbol has five characters. The firstis ©: Clock Ss: CS 
always a ‘‘t” (for time). The second character repre- H: Logic High Level . t  . Time 
sents a signal name. e.g., (CLK, ADS, etc.). The third L: ADS/Logic Low Level V: Valid — 
character represents the signal’s level (high or low) P: Vpp Programming Voltage Z:  Tri-state Level 
for the signal indicated by the second character. The X: No longer a valid “driven” logic level 


fourth character represents a signal name at which a 
transition occurs marking the end of the time interval 
being specified. 


AC CHARACTERISTICS: READ OPERATION 0°C < Ta < +70°C, Voc = 5V 10% 


27960C2-33 27960C2-25 27960C 1-16 


Versions 33 MHz 25 MHz 16MHz 
2 Wait State 2 Wait State 1 Wait State 


7 


10 


ooh 


NOTES: | | 7 - 

1. Valid signal level is meant to be either a logic high or logic low. = fon 

2. The subscript N represents the number of wait states for this parameter. CS can be de-asserted (high) after the number 
of wait states (N) has expired and the EPROM will continue to burst out data for the current cycle. ; 

3. BLAST # must be returned high before the next rising clock edge. _ L » ol 

4. The sum of tcuqy + tavcH + Nec-k will not equal actual tavay if independent test conditions are used to obtain taycH 
and tcHav (N = number of wait states). . ‘ | . : ¢ 

5. ADS must be returned high before the next rising clock edge. 

6. Sampled, not 100% tested. The transition is measured +500 mV from steady state voltage. 

_ 7. For capacitive loads above 80 pF, tcHaqy can be derated by 1 ns/20 pF. 
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Figure 10. 27960CX Pipelined 2 Wait State AC Waveforms 
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AC CONDITIONS OF TEST 
Input Rise and Fall Times 


(10% to 90%)... 6... eee. eee ee 4ns Input Timing Reference Level................ 1.5V 
Input Pulse Levels .................. 0.45V to 2.4V Output Timing Reference Level .............. 1.5V. 
Table 2. Mode Table | 


< 


= 
= 


Program Inhibit 
ID Byte 0: Manufacturer 
ID Byte 1: Part (27960) 


< 


Mu} Mi 


< 


|_ID Byte 3: 1 Wait State | Vir 
| 2 Wait States. 


‘| D 
© 
wn 
r) 
+ 

x 


NOTES: 

1. Vin until data terminated at which time BLAST must go to Vj,.. 

2. Need to toggle from Vj4 to Vi. to Vip. | 

3. See DC Programming Characteristics for Vcc, Vip and Vpp voltages. | . 

4.X can be Vi, or Viz. , 

5. Vpp = Vcc to meet standy current specification. Vcc > Vpp > Vi, will cause a slight increase in standby current. | 
6. The device must be in the idle state (by asserting RESET or using BLAST) before going into standby. 


CAPACITANCE(1) 1, = 25°C, f = 1.0 MHz 


Input Capacitance 
Output Capacitance: 
Vpp Capacitance 
NOTE: . . 


1. Sampled. Not 100% tested. 
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AC TESTING LOAD CIRCUIT 


TIMING PARAMETER 


DEVICE 
UNDER 
TEST 


VOH 
1.5V © OUTPUT 
VOL 


290236-15 
290236-14 


CL includes jig capacitance 
For tcHqz CL = 5 pF and Ry = 4050 


Input and output timings are measured from 1.5V. 
Timing values are specified assuming maximum input 
and output rise and fall time = 4 ns. 


CLOCK CHARACTERISTICS 


Fall Time 


Low Time 


Max Rise Time for Programming CLK = 100 ns 


CLOCK WAVEFORM 


290236-16 
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Program/Program Verify 


Initially, and after each erasure, all bits of the 


EPROM are in the “‘1’s” state. Data is introduced by 


selectively programming ‘‘0’s” into the desired bit 
locations. Although only ‘‘0’s” can be programmed, 
both “1’s” and “0’s” can be present in the data 
word. Ultraviolet erasure is the only way to change 
*Q’ s”’ to ‘f4's 9) 


Programming mode is entered when Vpp is raised to 
12.75V. Program/Verify operation is synchronous 
with the clock and can only be initiated following an 
idle state. Program and Program Verify take place in 
3 clock cycles. In the first clock cycle, addresses 
and data are input and programming occurs. Pro- 
gram Verify follows in the second clock cycle and 
the third clock cycle terminates synchronous Pro- 
gram/Verify operation, returning the state machine 
to the idle state with outputs at high impedance. 


As in the Read mode, Ao—A jg point to a four byte 
block in the memory array. During programming, the 
internal address increment circuitry is disabled and 
the programmer must supply Ag and A, to point to 


The programmer can verify the device identifier and 
choose the programming algorithm that corresponds 
to the Intel 27960CX. The intgligent Identifier can 
also be used to verify that the product is configured 
with the desired Read mode options for wait states. 


intgligent Identifier mode is entered when Ag (pin 32) 
is raised to its high voltage (Vip) level. The internal 


state machine is then set for intelligent Identifier 


an individual byte within the four byte block that is to. 


be programmed. Only one byte is programmed in 
each 3 cycle Program/Verify sequence. 


Program Inhibit | 


The Program Inhibit mode allows parallel program-— 


ming and verification of multiple devices with differ- 
ent data. With Vpp at 12.75V, a Program/Verify Se- 
quence is initiated for any device that receives a val- 
id ADS pulse and rising clock edge while CS is as- 
serted. A PGM pulse programs data in the first cycle 
of the sequence and data for Program Verify is out- 
put in the second cycle. The Program/Verify se- 
quence is inhibited on any devices for which CS is 
not asserted. Data will not be programmed and the 
outputs will remain in their high impedance state. 


intaligent Identifier™ Mode 


The device’s manufacturer, product type, and con- 
figuration are stored in a four byte block that can be 
accessed by using the inteligent Identifier™ mode. 


Read operation. Reading the identifier is similar to a 
Read operation on a one wait state configured prod- 
uct. Up to four bytes can be read in a single burst 
access. . intgligent Identifier read is terminated by a 
synchronous BLAST input, returning the state ma- 
chine to the idle state wn ee at ek imped- 
ance. 


The four byte block code for the intgligent Identifier 
code is located at address OOH through 03H and is 
encoded as follows: 


MEANING (A1,A0) .. DATA 
Intel ID_ Byte 00 ~ 89h 
— 27960 Byte 01 EOh 
CX Byte 10 O1b 
1 Wait State Byte 11 O1b 
2 Wait States Byte 11 10b — 
RESET MODE 


Due to the synchronous nature of the 27960CX, the 
various operating modes must be initiated from a 
known idle state. During normal operation, the inter- 
nal state machine returns to an idle state at the ter- 
mination of a bus access (after BLAST i is asserted). 


During initial device power up, the state machine is 


_ in an indeterminant state. The reset mode is provid- 


ed to force operation into the idle state. Reset mode 
is entered when the RESET pin is asserted. Output 
pins are asynchronously set to the high impedance 
state and address latches are put into the flow 
through mode. A reset is successfully completed 
and the state machine set in an idle state when 


-~RESET has been asserted for a minimum of 10 
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clock cycles and deasserted for five clock cycles. _ 
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ADDRESS = FIRST LOCATION 


=e 


Vpp =12.75V 
Vec = 6.25V 


X=0 


P 


INCREMENT X 


LAST 
\ INCREMENT ADDRESS j* “ADDRESS? 


DEVICE 
FAILED | 


Voc = 5.0V 
Vpp = 12.75V 


COMPARE 
ALL BYTES TO 
ORIGINAL 
DATA 


| PASS 
DEVICE PASSED 


Figure 11. Quick-Pulse Programming™ Algorithm 
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QUICK-PULSE PROGRAMMING™ 
ALGORITHM 


The Quick-Pulse Programming algorithm programs 
Intel’s 27960CX. Developed to substantially reduce 
programming throughput time, this algorithm allows 
optimized equipment to program a 27960CX in un- 
der 17 seconds. Actual programming time depends 
on the programmer used. | 


The Quick-Pulse Programming algorithm uses a 
100 ys pulse followed by a byte verification to deter- 


27960CX 
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mine when the addressed byte is correctly pro- 
grammed. The algorithm terminates if 25 100 ps 
pulses fail to program a byte. Figure 11 shows the 
27960CX Quick-Pulse Programming algorithm flow- 
chart. | 


The entire program-pulse/byte-verify sequence is 
performed with Vcc = 6.25V and Vpp = 12.75V. 


The program equipment must establish Vcc before 
applying voltages to any other pins. When program- 
ming is complete, all bytes should be compared to 
the original data with Vcc = 5.0V and Vpp = 
12.75V. | 


D.C. PROGRAMMING CHARACTERISTICS T, = 25° +5°C 


[tee 


Vip Ag inteligent Identifier 
Voltage. 


Supply Voltage (Program) 
Program Voltage 


NOTES: . | 
1. The maximium current value is with outputs unloaded. 


2. Vcc must be applied simultaneously or before Vpp and removed simultaneously or after Vpp. 


3. During programming clock levels are Vijy and Vj. © 
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A.C. PROGRAMMING, RESET AND ID CHARACTERISTICS Ta = 25°C +5°C 


CLK High to Address Invalid 
ADS Low to CLK High 

CLK High to ADS High 

GS Valid to CLK High 

CLK High to CS Invalid 

CLK High to Doyt Valid 
CLK High to Doyt Invalid 
BLAST Valid to CLK High 
CLK High to BLAST Invalid 
DATA Valid to PAM Low 


tPLPH PGM Program Pulse Width . 


PGM High to Djy Invalid 


CLK Low to PGM Low 
Din Tri-State to CLK High 
Vcc Program Voltage to CLK High 
Vpp Program Voltage to CLK High 
Ag Vip Voltage to CLK High 
CLK High to Ag Not Vip Voltage 
RESET Valid to CLK High 
CLK High to CLK Low 

CLK Low to CLK High 
NOTES: 


1. If CS is low, ADS can go low no sooner than the falling edge of the previous CLK. 
2. ADS must return high prior to the next rising edge of clock. 

3. CS must remain low until after the rising edge of CLK1. 

4. BLAST must return high prior to the next rising edge of CLK. 

5 

6 

7 


Parameter Max 


Address Valid to PGM Low 


1 


” 


re 


ns 


16) 
oO 


50 


on 
© 
=) 
2) 


O1 
oO 


7 100 


=] 
2) 


oi 
o) 


10 
11 
12 
5 
1 


O1 
oO 
> 
wn 


co 
on 


1 


© 
on 
” 


—_ 


=) 
wn w Ww ” 


” 


a 
7) 


ie?) 


4 
6 
7 
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9 
0 


ns 


oi ol 
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4 
1 
4 
, 
4 
1 
2 
2 
2 


2 


. Max CLK rise/fall time is 100 ns. 
. RESET must be low for 10 clock cycles and high for 5 clock cycles. 
. Vcc must be applied simultaneously or before Vpp and removed simultaneously or after Vpp. 
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Figure 12. 27960CX Programming Waveforms 
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Figure 13. 27960CX RESET and ID Waveforms 
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27960KX 
BURST ACCESS 1M (128K x 8) CHMOS EPROM 


m Synchronous 4-Byte Data Burst Access m Asynch Microcontroller er Function 


m Simple Interface to the 80960KA/KB ar pe to Known State with High Z 
m High Performance Clock to Data Out a 3 : 
= Zero Wait State Data-to-Data Burst  " ee for High Performance and 
— Supports 16, 20 and 25 MHz | : 
80960KA/KB Devices — 125 mA Active, 30 mA Standby 


_ — TTL Compatible Inputs | 
m 1 Mbit Density Configures as 128K x 8 


Intel’s 27960KX is a 5V only, 1,048,576 bit, Erasable Programmable Read Only Memory, organized as 128K 
words of 8 bits. 


The 27960KX provides a simple synchronous burst interface to the 80960KA/KB bus. Internally the 27960KX 
is organized in 4 byte blocks, in which each byte is accessed sequentially. The internal state machine is factory 
configured to generate either 1 or 2 wait-states between the address and first data byte. High performance 
outputs provide zero wait-state data to data accesses at clock frequencies up to 25 MHz. 


An asynchronous microcontroller RESET feature. puts the outputs in the high impedance state and takes the 
internal state machine to a known state where a new burst access can begin. 


The 27960KxX is available in 44 lead PLCC package, providing optimum cost effectiveness. 


The 27960KX is manufactured on Intel’s 1 micron CHMOS III-E technology. The Quick-Pulse Programming™ 
algorithm provides fast, reliable programming with throughput under 17 seconds for optimized equipment. 


*CHMOS is a patented process of Intel Corporation. 


X |X PREDECODER 


amoodanmoao x< 


SENSE Y DECODER 
AMP 


| 290237~-1 
Figure 1. 27960KX Burst EPROM Block Diagram 


September 1991 
4-40 Order Number: 290237-006 
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27960KX BURST EPROM 


EPROMs are established as the preferred code stor- 
age device in embedded applications. The non-vola- 
tile, flexible, reliable, cost effective EPROM makes a 
product easier to design, manufacture and service. 
Until recently, however, EPROMs could not match 
the performance needs of high-end systems. The 
27960KX was designed to support the 80960KA/KB 
embedded processor. It utilizes the burst interface to 
offer near zero-wait state performance without the 
high cost normally associated with this performance. 


In embedded designs, board space and cost must 
be kept at a minimum without impacting perform- 
ance and reliability. The 27960KX removes the need 
for expensive high-speed shadow RAM backed up 
by slow EPROM or ROM for non-volatile code stor- 
age..Code optimization concerns are reduced with 
“off-chip” code fetches no longer crippling to sys- 
tem performance. FONTs can be run directly out of 
these EPROMs at the same performance as high- 
speed DRAMs. With the 27960KX, the EPROM is 
the ideal code or FONT storage device for your 
80960KA/KB system. 
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Architecture 


The 27960KX provides a simple, synchronous burst 
interface to the 80960KA/KB’s bus. Internally, the 
27960KX is organized in 4 byte blocks each byte is 
accessed sequentially. A burst access begins on the 
first clock pulse after CS is asserted. The address of 
the four byte block is latched by the rising edge of 
ALE. After a preset number of wait-states (1 or 2), 
data is output one byte at a time on each subse- 
quent clock cycle. A burst access is terminated on 
the rising edge of CLOCK if BLAST is asserted. High 
performance outputs provide zero wait-state data to 
data accesses at clock frequencies up to 25 MHz. 
Extra power and ground pins dedicated to the out- 
puts reduce the effects of fast output switching on 
device performance. 


The 27960KX delivers 4 bytes of data in 8 clock 
cycles at 25 MHz and 4 bytes of data in 7 clock 
cycles at 20 MHz. In a 32-bit configuration, this 
translates into a read bandwidth of 50 Mbytes/sec 
and 45 Mbytes/sec respectively. Performance capa- 
bility of the 27960KX in different 80960KA/KB sys- 
tems is given in Table 1. 


27960KX 


BURST 
EPROM 


128K x 8 
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Figure 2. 27960KX Burst EPROM Signal Set 


PIN DESCRIPTIONS 
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N27960KX. 
44 LEAD PLCC 


0.650" x 0.650" 
TOP VIEW 
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Figure 3. 27960KX 44-Lead PLCC Pinout _ 


ADDRESS INPUTS: During a burst operation, Ao through Aj. provide the base 
address pointing to a block of four consecutive bytes. Ag and A; select the first 
byte of the burst access. The 27960KX latches valid addresses in the first clock 
cycle. An internal address generator increments addresses Ag and A, for 
subsequent bytes of the burst. 


ADDRESS LATCH ENABLE: Indicates the transfer of a physical address. ALE 
is an active low signal used to latch the addresses from the processor. 
Addresses are latched on the rising edge of ALE. Valid addresses must be 
present at or before ALE becomes valid. , 


| CHIP SELECT: Master device enable. When asserted (active low) data can be 


| written to and read from the device. In read mode, CS enables the state 


| machine and the I/O circuitry. 


NOTES: 
1. The address decode path is independent of CS, i.e., X and Y decoding is | 
always powered up. 
2. For programming, CS should remain low for the entire cycle. Program and 
verify functions are done one byte at a time. 
3. CS going high does not terminate a concurrent burst cycle. 
4. CS must be deasserted between bursts. 


RESET: Resets the state machine into a nee state, tri-states the outputs. The 
duration of RESET should be 10 CLK cycles minimum. At least 5 clock cycles 
are required after deassertion of RESET before beginning the next cycle. Reset . 
will abort a concurrent bus cycle. 
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PIN DESCRIPTIONS (Continued) 


[symboi| pin [SSS 
4 
2 


[2 | PROGRAMMING POWER SUPPLY Vpp 
15, 19, 21 
[woo [e. 6.20.44 [SURPLYVOLTAGENPUT.—=~=“‘CS!C™S™#*~S*S*<C~*S 


Table 1. Performance Capability 


25/20 MHz 2 WS NON-BUFFERED : 4 WORDS/8 CLOCK CYCLES — 50/40 MBYTES/SEC 
ADDR| Ag) iWS {ws | - | - | - |- | RS| Ag |wS |ws |- |[- | - - |RS 
DATA - | Doo | Dor | Doz | Dos i. 3 : Dio | D414 | Die | Dig 
CLK Cy ie | Cel. Col <Cx le, Gy, Ca Ce. SCas! Ca Ga 

20 MHz 1 WS NON-BUFFERED : 4 WORDS/7 CLOCK CYCLES — 45 MIBYTES/SEC 

ADDR | Ago | WS - - - - |RS | Ag, | WS | - - - - |RS | Agog 


DATA | - - | Doo | Dor | Doz | Dos ae) - | Dio | D144 | Dio | Dig 
CLK | Cy Co C3 | C4 C5} Cel C7} Cy Co C3 | Cqg | C5] Cel C7 


16 MHz 1 WS BUFFERED : 4 WORDS/7 CLOCK CYCLES — 36 MBYTES/SEC 

ADDR | Ago | WS - - - - Aoi | WS - - - |RS | Agog 
DATA - | Doo | 901} Do2|Do3s{ -j] - Di1 | Dy2 | Dig 

CLK | CsCl (Gel aCe IC: Cy Cele t 1G. 


INTERFACE EXAMPLE system (shown) and 48 pF for a 256K x 32 system. 
Amr The EPROM is specified at 4 pF for input capaci- 


: : | tance and 12 pF typical for output capacitance. 
Overview Larger systems can be implemented with buffers. 


The following design offers a simple interface to the 
80960KA/KB’s bus. Chip Select Logic 


A non-buffered 27960KX burst EPROM system is High order address lines are decoded to provide CS. 
shown in Figure 4. Since the 27960KX is capable of | Qualification with other signals is not required. The 
driving a 120 pF load, large, non-buffered systems chip select logic can be implemented with standard 
_can be implemented by stacking up to 2 banks of 4 asynchronous decoders, PAL’s or PLD’s (like Intel’s 
EPROMs, giving a memory size of 256K x 32. The  —- 85C-960). 

input capacitive load seen on the address lines (due 

to the EPROM only) is 24 pF for a 128K x 32 
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ADDRESS 
LATCHES 


LAD(31:4) TEES ADDRESS TO NON-BURST MODE MEMORY 


* SEE NOTE 


AonmAtg 


LAD(31:0) \ es _ 7 cs 2 <s 


Fr ea me WEF a | 


x 85C960 27960KX 
128K x 8 


80960KX BLAST - 
CLK 
RESET 


CLK2(50 MHz) | 


bee CLK(25 MHz) 


GENERATOR RESET 


NOTE: | 
27960KX does not require address latches 


Figure 4. 128K x 32 Burst EPROM System 


Waveforms 


Figure 5 shows the timing waveforms of 27960KX : 
reads in a 32-bit system. 


CS setup time 


CS setup time is the time between CS asserted and 
_ the first rising CLK edge of CLK (during the address 
cycle). Since a memory access begins on the first 
CLK rising edge after CS asserted, a minimum CS 
setup time of 5 ns (tsycH) at 25 MHz is required. 
With the 80960KA/KB’s maximum valid address de- 
lay of 18 ns at 25 MHz, 13 ns remains for CS decod- 
ing logic. 


27960KX 
128K x 8 


BLAST 
CLK 
RESET 
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CS Deassert between bursts 


After every EPROM read (one to four words) CS 
must be deasserted. 


Reset and RESET 


The 27960KX uses RESET. The 80960 KA/KB > 
RESET signal must be inverted for the 27960KX. 
Clock Phase 


The initial rising edge of CLK and CLK2 must be in 
phase with as small a skew as possible. 3 
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om 


1 indicates the number of wait states to access the first word 


0’s indicate the number of wait states for subsequent data words (0 in this case) 
2. 27960KX latches addresses on the rising edge of ALE: it has an internal address generator which increments ad- 


dresses for subsequent words of the burst. 


Figure 5. Two Cycles of a 27960KX 1 Wait State, 4-Byte Read (1-0-0-0 Burst Read) in a 32-Bit System 


27960KX DEVICE NAMES 


The device names on the 27960KX were derived as 
mnemonics that correspond to the number of wait 
states and expected operating frequency for the de- 
vice. For example, the 25 MHz, 2 wait state 
27960KX is named 27960K2-25. 


AC TIMING DERIVATIONS 


The AC timings for the 27960KX were generated 
specifically to meet the requirements of the 
80960KA/KB microprocessor. In each case the ap- 
plicable 80O960KA/KB clock frequency and AC tim- 
ing were taken together with an address buffer delay 


(if needed) and a 4 ns positive clock skew or a 2 ns” 


negative clock skew (see Figure 6A) guardband to 
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generate the 27960KX AC timing. Worst case tim- [4yesies 


ings were always assumed. The example below 
shows how the 27960K1-20 tavcgh timing was de- 
rived. 


@20 MHz the clock cycle is ~ 50 ns. 
tg of the 80960KA/KB is 2-20 ns. 
4 ns clock skew guardband. 


27960K1-20 tavcoh = 50ns — 20ns — 4ns 
= 26nS. 


On timings such as this, where the EPROM is faster 
than the microprocessor, we specified the EPROM’s 
timing leaving the excess time as an guard- 
band. 


intel : 27960KX PRELIMINARY 


CLK2 
(to 80960) 
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NOTE: 
The 27960KX allows a positive clock skew (CLK2 leading CLK) of up to 4 ns and a negative clock skew (CLK2 lagging 
CLK) of up to 2 ns. The larger positive clock skew takes into account longer trace lengths and heavier loading on the 1x 


clock trace. 


Figure 6A. Definition of Positive and Negative Clock Skew 
| 


80960KB 


Combinatorial 
PAL Driver 


16L8=7. | | 74F244 | cLK 


27960KX 27960KX 27960KX 27960KX 
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NOTE: 
CLK and CLK2 are generated by the same PAL. This minimizes skew between CLK and CLK2. Both PAL outputs are fed 


to a 74F244 driver. The EPROMs should be as close to the clock driver as possible. 


Figure 6B. Example Clock Circuit with Minimum Skew | 
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This clock generation circuit uses a 100 MHz oscillator. The EPROMs should be as close to the NAND drivers as 


possible. 


Figure 6C. Example Clock Circuit Using a 100 MHz Oscillator 


Decoders are needed for the systems address (chip 
select) decoding. For the 27960KX’s timings we as- 
sumed a 5-10 ns chip select decoder for 16 MHz 
and 20 MHz frequencies and a 5-9 ns decoder for 
25 MHz systems. The example below shows how 
the 27960K2-25 tsvch timing was derived. 


@25 MHz the clock cycle is ~ 40 ns. 
tg of the 80960KA/KB is 2-18 ns. 
Decoder = 9ns 

4 ns clock skew guardband 


27960K2-25 tsvch = 40 ns —- 18ns - 9ns — 4ns 
= O9ns_ 


SYSTEM BUFFERING CONSIDERATIONS 


For many large system applications buffering may 
be required between the microprocessor and memo- 
ry devices. The 20 MHz — 2 WS and 16 MHz 
27960KX AC timings take this into account. For ap- 
plications at these frequencies not requiring buffer- 
ing these devices will provide an additional 5-10 ns 
of system guardband. 


The list below shows the puliels used in Generating 
these timings: 


Aout Output 

Buffer Buffer 
20 MHz 9ns 5ns 
16 MHz 10 ns 7 ns 


The 20 MHz buffers are slightly faster in keeping 
with the increased sensitivity for higher perform- 
ance. We chose the above buffers because of their 
wide availability. Significantly faster buffers are avail- 
able for applications requiring them. The example 
below shows tchav for the 27960K2-20. 


@20 MHz the clock cycle is ~ 50 ns. 
tio of the 80960KA/KB is 3 ns. 
Output buffer for 20 MHz = 5 ns. 

4 ns clock skew guardband 
ate 20 tchqv = 50 ns — 5ns — 3ns — 4ns 
= 38ns 
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ABSOLUTE MAXIMUM RATINGS* 


27960KX 


PRELIMINARY 


NOTICE: This data sheet contains preliminary infor- 


mation on new products in production. The specifica- 


Read Operating Temperature...... 0°C to + 70°C(8) tions are subject to change without notice. Verify with 
Case Temperature under Bias .. — 10°C to +80°C(8) your local Intel Sales office that you have the latest 
| data sheet before finalizing a design. | 
Storage Temperature .......... —65°C to + 125°C WARNING Gis=a Eee aaa 
| . : Stressing the device beyond the “Absolute 

All Input or Output Voltages..... —0.6V to +6.5V(4) Maximum Ratings” may cause permanent damage. 

with Respect to Ground | These are stress ratings only. Operation beyond the 
Voltage On Ag... .. ccc eee eee —0.6V to + 13.0V(4) “Operating Conditions” is not recommended and ex- 

with Respect to Ground tended exposure beyond the “Operating Conditions” 
Vpp Supply Voltage........... —0.6V to + 14.0V(4) ey alee Govice:-LelanMy. 

with Respect to Ground 
Vcc Supply Voltage ........... —0.6V to + 7.0V(4) 


with Respect to Ground 


DC CHARACTERISTICS: READ OPERATION 
O°C < Ta < +70°C, Voc = 5V + 10%, TTL Inputs 


Parameter 


Input Load Current 
Output Leakage Current 
Vpp Load Current Read 


Voc Standby | Switchin 


[Symbol | 
pp 
1 
los 


Input Low Voltage 70. 


Output Low Voltage 
Output High Voltage 


Output Short Circuit ee 100 
NOTES: | 


1. Maximum current is with outputs unloaded. 

2. Ico standby current assumes no output loading, i.e., loy = lo, = O mA. 

3. Ico is the sum of current through Vocg3 + Vcc4 and does not include the current through Voc; and Voce. (Vcc; and 
Vcc2 supply power to the output drivers. Vcc3'and Vcoc4 supply power to the rest of the device.) 

4, Minimum DC voltage on input and output pins is —0.5V. During transitions, this level may undershoot to —2.0V for 
periods less than 20 ns. : 

5. Maximum DC voltage on input and output pins is Vcc + 0.5V which may overshoot to Vcc + 2.0V for periods less than 
20 ns. 

6. One output shorted for no more than one second. log is sampled but not 100% tested. 

7. loc max measured with a 10.11 F capacitor between Vcc and Vss. 

8. This specification defines commercial product operating temperatures. 
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EXPLANATION OF AC SYMBOLS The fifth character represents the signal level indi- 
cated for the fourth character. The list below shows 

The nomenclature used for timing parameters are as character representations. 

per IEEE STD 662-1980 IEEE Standard Terminology A: Address R: Reset 

for Semiconductor Memory. B: BLAST Q: Data 

Each timing symbol has five characters. The first is . ms isi ceeel S he 

always a “‘t” (for time). The second character repre- noe eve t me 

sents a signal name, e.g., (CLK, ALE, etc.). The third  -' ALE/Logic Low Level Vv: Valid 

character represents the signal’s level (high or low) P: Vpp Programming Voltage 2:  Tri-state level 

for the signal indicated by the second character. The X: No longer a valid “driven” logic level 


fourth character represents a signal name at which a : 
transition occurs marking the end of the time interval 
being specified. : 


AC CHARACTERISTICS: READ OPERATION 0°C < Ta < +70°C, Voc = 5V 410% 


27960K2-25 | 27960K1-20 | 27960K2-20 | 27960K1-16 


25 MHz 20 MHz 20 MHz 16 MHz 
2 Wait States | 1 Wait State | 2 Wait States | 1 Wait State 


Versions 


Symbol Characteristic 

tavCgH | Address Valid to 
CLK High 

taviH | Address Valid 
to ALE High 

ALE Low to ALE High 

tLHax | ALE High to 
Address Invalid 

tsycH_ | CS Valid 
to CLK High 


tcnjHsx | CLK High to cs 
Invalid 


CLK High to Data Valid | 7 


Pel [as rary 
ciKtighiodaaimaid] | 7 | [7] [7] [7] |r. 
2 icnaz [CuKrightoatarignz| 6 | | wo | [es] || | os [re 


10 |tgycH | BLAST Valid to | | 
pe | CLK High | 15 15 15 15 
11 | tcHpx | CLK High to . | - | 


NOTES: 

1. Valid signal level is meant to be either a logic high or logic low. - 

2. teyHsx—The subscript N represents the number of wait states for this parameter. CS can be de-asserted (high) after the 
number of wait states (N) has expired. The EPROM will continue to burst out data for the current cycle. 

3. BLAST must be returned high before the next rising clock edge. 
4. The sum of tcHaqy + tavcH + NCLK will not equal actual tayay if independent test conditions are used to obtain taycH 
and tcHav (N = number of wait states). 

5. CS must be deasserted after every burst read (see Figure 7). 

6. Sampled, not 100% tested. The transition is measured +500 mV from steady state voltage. 

7. For capacitive loads above 120 pF, tcHaqy can be derated by 1 ns/20 pF. 


Pea ee See Set 
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AC CONDITIONS OF TEST 


Input Rise and Fall Times 


(10% to 90%)... eee eee cece cece ce eas 4ns 
Input Pulse Levels .................. 0.45V to 2.4V 
Input Timing Reference Level ................ 1.5V 
Output Timing Reference Level...... 0.8V and 2.0V 


Table 2. Mode Table 


Program 


Program Verify 


Program Inhibit X | Vin i 
Vin 
ID Byte 3: 1 Wait-State | Vi 
2 Wait-States 


NOTES: 

1. Vip until data terminated at which time BLAST must go to Vj. 

2. Need to toggle from Vj to Vi_ to Vip to latch address. 

3. See DC Programming Characteristics for Voc, Vip and Vpp voltages. 

4. X can be Viz or Vip. | 

5. Vpp = Vcc to meet standby current specification. Vcc > Vpp > Vi will cause a slight increase in standby current. 
6. The device must be in the idle state (by asserting RESET or using BLAST) before going into standby. 
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CAPACITANCE(1) T, = 25°C, f = 1.0 MHz 


NOTE: 
1. Sampled, not 100% tested 


AC INPUT/OUTPUT REFERENCE WAVEFORMS | | 


DEVICE 


TIMING PARAMETER UNDER 


Vou 
OUTPUT 


Vit 


290237-15 

Vo | 

: | 290237 -14 
AC test inputs are driven at 2.4V (Voy) for a logic ‘1’ 
and 0.45V (Vo,) for a logic ‘0’. . 
Input timing begins at 1.5V. 
Output timing ends at Vjy (2.0V) and Vj, (0.8V) 
‘Input Rise and fall times (10% to 90%) < 4.0 ns © 


For tcHqz CL = 5 pF and Rr = 4050 
_ Cy includes jig capacitance 


CLOCK CHARACTERISTICS 


Max CLK Rise Time during Programming is 100 ns 


CLOCK WAVEFORM 
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Program/Program Verify 


Initially, and after each erasure, all bits of the 
EPROM are in the ‘1’s” state. Data is introduced by 
selectively programming “0’s” into the desired bit 
locations. Although only “‘0’s” can be programmed, 
both ‘1’s” and ‘‘0’s” can be present in the data 
word. Ultraviolet erasure is the only way to change 
“0’s” to “1's”. 

Program mode is entered when Vpp is raised to 
12.75V. Program/Verify operation is synchronous 
with the clock and can only be initiated following an 
idle state. Program and Program Verify take place in 
3 clock cycles. In the first clock cycle, addresses 
and data are input and programming occurs. Pro- 
gram Verify follows in the second clock cycle and 
the third clock cycle terminates synchronous Pro- 
gram/Verify operation, returning the state machine 
to the idle state with outputs at high impedance. 


As in the Read mode, Ao—-Aj¢ point to a four byte 
block in the memory array. During Programming the 
internal address increment circuitry is disabled and 
the programmer must supply Ap and A; to point to 
an individual byte within the four byte block that is to 
be programmed. Only one byte is programmed in 
each 3 cycle program/Verify sequence. 


Program Inhibit | 


Program Inhibit mode allows parallel programming 
and verification of multiple devices with different 
data. With Vpp at 12.75V, a Program/Verify se- 
quence is initiated for any device that receives a val- 
id ALE pulse and rising clock edge while CS is as- 
serted. A PGM pulse programs data in the first cycle 
of the sequence and data for Program Verify is out- 
put in the second cycle. The Program/Verify_ se- 
quence is inhibited on any devices for which CS is 
not asserted during the first (ALE) cycle. Data will 
not be programmed and the outputs will remain in 
their high impedance state. 


inteligent Identifier™ Mode 


The device’s manufacturer, product type, and con- 
figuration are stored in a four byte block that can be 


27960KX 


2 wait states 


PRELIMINARY 


accessed by using the intgligent Identifier™™ mode. 
The programmer can verify the device identifier and 
choose the programming algorithm that corresponds 
to the Intel 27960KX. The intgligent Identifier can 
also be used to verify that the product is configured 
with the desired Read mode options for wait states. 


Inteligent Identifier mode is entered when Ag (pin 
32) is raised to its high voltage (Vy) level. The inter- 
nal state machine is then set for inteligent Identifier 
Read operation. Reading the Identifier is similar to a 
Read operation on a one wait state configured prod- 
uct. Up to four bytes can be read in a single burst 
access. intgligent Identifier read is terminated by a 
synchronous BLAST input, returning the state ma- 
chine to the idle state with outputs at high imped- 
ance. 


The four byte block code for the intgligent Identifier 
code is located at address 00H through 03H and is 
encoded as follows: 


MEANING (Az, Ao) DATA 

Intel ID Byte 00 89h 

27960 Byte 01 EOh 

KX . Byte 10 OOb 

1 wait state Byte 11 Oib 
10b 


Byte 11 


RESET MODE. 


Due to the synchronous nature of the 27960KX, the 


various operating modes must be initiated from a 
known idle state. During normal operation, the inter- 
nal state machine returns to an idle state at the ter- 


~ mination of a bus access (after BLAST is asserted). 
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During initial device power up, the state machine is 
in an indeterminant state. The reset mode is provid- 
ed to force operation in to the idle state. Reset mode 
is entered when the RESET pin is asserted. Output 
pins are asynchronously set to the high impedance 
state and address latches are put into the flow 
through mode. A reset is successfully completed 
and the state machine set in an idle state in the 
cycle after RESET has been asserted for a minimum 
of 10 clock cycles and deasserted for five clock cy- 
cles. 
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START 
ADDRESS = FIRST LOCATION 


Vpp =1 2.75V 


PROGRAM ONE 100 ys PULSE 


INCREMENT X 


LAST 
{ INCREMENT ADDRESS ADDRESS? 


| DEVICE | 
| FAILED | 


Vec =5.0V 
Vpp =12.75V 


COMPARES 
ALL BYTES TO 
ORIGINAL 


DEVICE PASSED 


Figure 8. Quick-Pulse Programming™ Algorithm 
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QUICK-PULSE PROGRAMMING pulses fail to program a byte. Figure 8 shows the 
ALGORITHM | snags Quick-Pulse Programming algorithm flow- 
cnart. — 

The Quick-Pulse Programming algorithm programs 

Intel’s 27960KX. Developed to substantially reduce ‘The entire program-pulse, byte-verify sequence is 
programming throughput time, this algorithm allows performed with Vcc = 6.25V and Vpp = 12.75V. 
optimized equipment to program a 27960KX in un- —«s The programming equipment must establish Vcc be- 
der 17 seconds. Actual programming time depends __ fore applying voltages to any other pins. When pro- 
on the programmer used. : gramming is complete, all bytes should be compared 


to the original data with Vcc = 5.0V and Vpp = 


The Quick-Pulse Programming algorithm uses a  12.75V. 
100 ps pulse followed by a byte verfication to deter- 
mine when the addressed byte is correctly pro- 
grammed. The algorithm terminates if 25 100us 


D.C. PROGRANIMING CHARACTERISTICS T, = 25°C +5°C 


[Parameter | Notes 
Py [put toad Curent | 
Tics | Vos Program Gurent | 1 
Cee | Vee Program Curent ft 
vn [input Low votage 
Pi [ np igh Voge 
ce ane 
Son —— 
"ea — 
ee - 

a 


| Max | Unit | Test Condition | 
VIL 


Output Low Voltage (Verify) 
Output High Voltage (Verify) 
Ag inteligent Identifier Voltage 


I 

loc 

VIL | 
VIH 

VOL 

VOH 

VID 

Vcc 

VPP 


Supply Voltage (Program) 
/Vpp | Program Voltage | 


NOTES: 

1. The maximum current value is with outputs unloaded. | 

2. Vcc must be applied simultaneously or before Vpp and remove simultaneously or after Vpp. 
3. During programming clock levels are Viy and Vj,. 
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Address Valid to PGM Low 
CLK High to Address Invalid 
tCHLH 


ALE Low to CLK High 
CLK High to ALE High 


tsycH CS Valid to CLK High 


CLK High to CS Invalid 


7 


10 


—h 


2 
3 
4 


DATA Valid to PGM Low 
PGM Program Pulse Width 
PGM High to Din Invalid 


CLK Low to PGM Low 
Din in Tri-State to CLK High 


1 
1 


~b 
oOo 


1 
16 
17 Vpp Program Voltage to CLK High 

, 


tBVCH BLAST Valid to CLK High 
CLK High to BLAST Invalid 
Ag Vip Voltage to CLK High 


Voc Program Voltage to CLK High 


hk 
co) 


8 
20 
2 
2 


CLK High to Dour. Valid | 
CLK High to Dour Invalid 


¢ 


1 
2 


NOTES: 

1. if CS is low, ALE can go low no sooner than the falling edge of the previous CLK. 

2. ALE must return high prior to the next rising edge of clock. 

3. CS must remain low until after the rising edge CLK1. | 

4. BLAST must return high prior to the next rising edge of CLK. 

5. Max CLK rise/fall time is 100 ns. 

6. RESET must be held low for 10 cycles and high for 5 cycles before performing a read. 

7. Vcc must be applied simultaneously or before Vpp and removed simultaneously or after Vpp. 
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Figure 10. 27960KX RESET and ID Waveforms | 
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82596CA 
HIGH-PERFORMANCE 32-BIT LOCAL 
AREA NETWORK COPROCESSOR 


Performs Complete CSMA/CD Medium [2 Optimized CPU Interface 


Access Control (MAC) Functions— — Optimized Bus Interface to Intel’s 
Independently of CPU i486™ DX, i4d86TMSX and 80960CA 
— IEEE 802.3 (EOC) Frame Delimiting Processors 
— HDLC Frame Delimiting — Supports Big Endian and Little 
mw Supports Industry Standard LANs Endian Byte Ordering 
— IEEE TYPE 10BASE-T, 32-Bit Bus Master Interface 
IEEE TYPE 10BASE5 (Ethernet”*), — 106 MB/s Bus Bandwidth 
IEEE TYPE 10BASE2 (Cheapernet), — Burst Bus Transfers 
IEEE TYPE 1BASE5 (StarLAN), — Bus Throttle Timers 
and the Proposed Standard — Transfers Data at 100% of Serial 
10BASE-F Bandwidth | 
— Proprietary CSMA/CD Networks Up — 128-Byte Receive FIFO, 64-Byte 
to 20 Mb/s Transmit FIFO 


On-Chip Memory Management 
— Automatic Buffer Chaining 7 
— Buffer Reclamation after Receipt of 
Bad Frames; Optional Save Bad 


Self-Test Diagnostics 


Configurable Initialization Root for Data 
Structures 


Gy 


Frames O High-Speed, 5V, CHMOS** IV 
— 32-Bit Segmented or Linear (Flat) Technology | 7 
Memory Addressing Formats |  132-Pin Plastic Quad Flat Pack (PQFP) 
Network Management and Diagnostics and PGA Package | 
— Wlonitor Mode : (See Packaging Spec Order No. 240800-001, 
— 32-Bit Statistical Counters a ae 
82586 Software Compatible TsEihemet 6 4 tegleiaed wederars of Setse COaoIAIOR: 


**CHMOS is a patented process of Intel Corporation. 
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Subsystem 


Serial 
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8 Transmit 
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INTRODUCTION 


The 82596CA is an intelligent, high-performance 
32-bit Local Area Network coprocessor. The 
82596CA implements the CSMA/CD access method 
and can be configured to support all existing IEEE 
802.3 standards—TYPEs 10BASE-T, 10BASE5, 
1OBASE2, 1BASE5, and 10BROADSE. It can also be 
used to implement the proposed standard TYPE 
10BASE-F. The 82596CA performs high-level com- 
mands, command chaining, and interprocessor com- 
munications via shared memory, thus relieving the 
host CPU of many tasks associated with network 
control. All time-critical functions are performed in- 
dependently of the CPU, this increases network per- 
formance and efficiency. The 82596CA bus interfac- 
es is optimized for Intel’s i486™SX, i486TMDX, 
80960CA, and 80960KB processors. 


The 82596CA implements all IEEE 802.3 Medium 
Access Control and channel interface functions, 
these include framing, preamble generation and 
stripping, source address generation, destination ad- 
dress checking, short-frame detection, and automat- 
ic length-field handling. Data rates up to 20 Mb/s are 
supported. | 


The 82596CA provides a powerful host system inter- 
face. It manages memory structures automatically, 
with command chaining and bidirectional data chain- 
ing. An on-chip DMA controller manages four chan- 
nels, this allows autonomous transfer of data blocks 
(buffers and frames) and relieves the CPU of byte 
transfer overhead. Buffers containing errored or col- 
lided frames can be automatically recovered without 
CPU intervention. The 82596CA provides an up- 
grade path for existing 82586 software drivers by 
providing an 82586-software-compatible mode that 
supports the current 82586 memory structure. The 
82586CA also has a Flexible memory structure and 
a Simplified memory structure. The 82596CA can 
address up to 4 gigabytes of memory. The 82596CA 
supports Little Endian and Big Endian byte ordering. 


The 82596CA bus interface can achieve a burst 
transfer rate of 106 MB/s at 33 MHz. The bus inter- 
face employs bus throttle timers to regulate 
82596CA bus use. Two large, independent FIFOs— 
128 bytes for Receive and 64 bytes for Transmit— 
tolerate long bus latencies and provide programma- 
ble thresholds that allow the user to optimize bus 
overhead for any worst-case bus latency. The high- 
performance bus is capable of back-to-back trans- 
mission and reception during the IEEE 802.3 9.6-ys 
Interframe Spacing (IFS) period. 


The 82596CA provides a wide range of diagnostics 
and network management functions, these include 
internal and external loopback, exception condition 
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tallies, channel activity indicators, optional capture 
of all frames regardless of destination address 
(promiscuous mode), optional capture of errored or 
collided frames, and time domain reflectometry for 
locating fault points on the network cable. The sta- 
tistical counters, in 32-bit segmented and linear 
modes, are 32-bits each and include CRC errors, 
alignment errors, overrun errors, resource errors, 
short frames, and received collisions. The 82596CA 
also features a monitor mode for network analysis. 
In this mode the 82596CA can capture status bytes, 
and update statistical counters, of frames monitored 
on the link without transferring the contents of the 
frames to memory. This can be done concurrently 
while transmitting and receiving frames destined for 
that station. 


The 82596CA can be used in both baseband and 
broadband networks. It can be configured for maxi- 
mum network efficiency (minimum contention over- 
head) with networks of any length. Its highly flexible 
CSMA/CD unit supports address field lengths of 
zero through six bytes—configurable to either IEEE 
802.3/Ethernet or HDLC frame delimitation. It also 
supports 16- or 32-bit cyclic redundancy checks. 
The CRC can be transferred directly to memory for 
receive operations, or dynamically inserted for trans- 
mit operations. The CSMA/CD unit can also be con- 


figured for full duplex operation for high throughput | 1 


iN point-to-point connections. 


82596 B-Stepping 


The 82956 B-Step incorporates new features com- 
pared to the 82596 A1 stepping. The following is a 
summary of the 82596 B-step new features. 


° The 82596 B-step transmit buffers can now be 
byte aligned. | 


In big endian mode, and when configured to Lin- 
ear mode, the 82596 B-step treats 32-bit address 

pointers as big endian 32-bit entities. However, 
the SCB absolute address and statistical coun- 
ters are still treated as two 16-bit big endian enti- 
ties. This big endian 32-bit entity support is con- 
figured through the SYSBUS byte; not setting this 
mode will configure the 82596 B-step to be 100% 
compatible to the 82596 At-step big endian 
mode. 


The 82596 B-step has improved performance on 
back-to-back frame transmission. 


The 82596 B-step can be configured to reread 
the next Command Block on the CB list upon re- 
ceiving a CU RESUME Control Command. | 


The 82596CA is fabricated with Intel’s reliable, 5-V, 
CHMOS IV (process 648.8) technology. It is avail- 
able in a 132-pin PQFP or PGA package. 
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Figure 2. 82596CA PQFP Pin Configuration 
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Figure 3. 82596CA PGA Pinout 
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82596CA PGA Cross Reference by Pin Name 


Signal PinNo. | Signal Pin No. 


J2 M5 
H3 > | N5 | 
G2 | M7 
G3 | PS 
G1 M8 
D1 | P9 
C1 N2- 
F3 N6 
D2 B M1 
C2 | P4 
E3 | BST6é . | Nt 
D3 P3 
B2 J3 
Bt L2 


C3 L3 


Al L1 
B3 K3 
C4 M6 
A2 P2 
C5 7INT | N3 
A3 | 
B4 M4 
A4 3 P1 
C6 M2 
B5 : M3 
er, 
A5 N4 
Bs 7 

C8 

AQ 

C9 

BQ 
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CLOCK. The system clock input provides the fundamental timing for 
the 82596. It is a 1X CLK input used to generate the 82596 clock and 
requires TTL levels. All external timing parameters are specified in 
reference to the rising edge of CLK. | 


DATA BUS. The 32 Data Bus lines are bidirectional, tri-state lines that. 
provide the general purpose data path between the 82596 and 
memory. With the 82596 the bus can be either 16 or 32 bits wide; this 
is determined by the BS16 signal. The 82596 always drives all 32 data 
lines during Write operations, even with a 16-bit bus. D31— DO are 
floated after a Reset or when the bus is not acquired. 
These lines are inputs during a CPU Port access; in this mode the CPU 
writes the next address to the 82596 through the data lines. During 
PORT commands (Relocatable SCP, Self-Test, Reset and Dump) the 
~ address must be aligned to a 16-byte boundary. This frees the D3—Do 
lines so they can be used to distinguish the commands. The following 
is asummary of the decoding data. 


DO-D31 14-53 I/O 


Reset 
Relocatable SCP 
Self-Test 

Dump Command 


DATA PARITY. These are tri-stated data parity pins. There is one 
parity line for each byte of the data bus. The 82596 drives them with 
even-parity information during write operations having the same timing 
as data writes. Likewise, even-parity information, with the same timing 
as read information, must be driven back to the 82596 :-over these pins 
to ensure that the correct parity check status is indicated by the 
82596. - 


PARITY CHECK. This pin is driven high one clock after RDY to inform 
Read operations of the parity status of data sampled at the end of the 
previous clock cycle. When driven low it indicates that incorrect parity 
data has been sampled. It only checks the parity status of enabled 
‘bytes, which are indicated by the Byte Enable and Bus Size signals. 
PCHK is only valid for one clock time after data read is returned to the 
82596; i.e., it is inactive (high) at all other times. 


ADDRESS LINES. These 30 tri-stated Address lines output the 
address bits required for memory operation. These lines are floated 
_ after a Reset or when the bus is not acquired. 


BYTE ENABLE. These tri-stated signals are used to indicate which 
bytes are involved with the current memory access. The number of 
Byte Enable signals asserted indicates the physical size of the data 
being transferred (1, 2, 3, or 4 bytes). 

° BEO indicates D7-DO 

© BE1 indicates D15—D8 

© BE2 indicates D23-D16 , 

° BES indicates D31-D24 . 
These lines are floated after a Reset or when the bus is not acquired. 


WRITE/READ. This dual function pin is used to distinguish Write and 
Read cycles. This line is floated after a Reset or when the bus is not 


DPO-DP3 I/O 


PCHK 


109-114 


acquired. 


4-65 


intel. er PRELIMINARY 


PIN DESCRIPTIONS (Continued) 


ADDRESS STATUS. The 82596 uses this tri-state pin to indicate to __ 
indicate that a valid bus cycle has begun and that A31-—A2, BE3-BEO, 
and W/R are being driven. It is asserted during t1 bus states. This line 
is floated after a Reset or when the bus is not acquired. 


READY. Active low. This signal is the acknowledgment from 
addressed memory that the transfer cycle can be completed. When 
high, it causes wait states to be inserted. It is ignored at the end of the 
first clock of the bus cycle’s data cycle. This active-low signal does not 
have an internal pull-up resistor. This signal must meet the setup and 
hold times to operate correctly. 


BURST READY. Active low. Burst Ready, like RDY, indicates that the 
external system has presented valid data on the data pins in response 
to a Read, or that the external system has accepted the 82596 data in 
response to a Write request. Also, like RDY, this signal is ignored at 
the end of the first clock in a bus cycle. If the 82596 can still receive 
data from the previous cycle, ADS will not be asserted in the next 
clock cycle; however, Address and Byte Enable will change to reflect . 
the next data item expected by the 82596. BRDY will be sampled 

_ during each succeeding clock and if active, the data on the pins will be 

strobed to the 82596 or to external memory (read/write). BRDY 

operates exactly like READY during the last data cycle of a burst 
sequence and during nonburstable cycles. 


BURST LAST. A signal (active low) on this tri-state pin indicates that 
the burst cycle is finished and when BRDY is next returned it will be 

treated as a normal ready; i.e., another set of addresses will be driven 
with ADS or the bus will go idle. BLAST is not asserted if the bus is not 
acquired. 


ADDRESS HOLD. This hold signal is active high, it allows another bus 

master to access the 82596 address bus. In a system where an 82596 | 

and an i486 processor share the local bus, AHOLD allows the cache 

controller to make a cache invalidation cycle while the 82596 holds the 

_ address lines. In response to a signal on this pin, the 82596 | 
~immediately (i.e. during the next clock) stops driving the entire address 
bus (A31-A2); the rest of the bus can remain active. For example, 

' data can be returned for a previously specified bus cycle during 

Address Hold. The 82596 will not begin another bus os while 

AHOLD is active. 


BACKOFF. This signal is active low, it informs the 82596 that another 
bus master requires access to the bus before the 82596 bus cycle 
completes. The 82596 immediately (i.e. during the next clock) floats its 
bus. Any data returned to the 82596 while BOFF is asserted is ignored. 
BOFF has higher priority than RDY or BRDY; if two such signals are 
returned in the same clock period, BOFF is given preference. The 
82596 remains in Hold until BOFF goes high, then the 82596 resumes 
its bus cycle by driving out the address and status, and asserting ADS. 
BOFF should not be asserted during 11. 


LOCK. This tri-state pin is used to distinguish locked and unlocked bus 
cycles. LOCK generates a semaphore handshake to the CPU. LOCK 
can be active for several memory cycles, it goes active during the first 
locked memory cycle (t1) and.goes inactive at the last locked cycle — 
(t2). This line is floated after a Reset or when the bus is not acquired. — 
LOCK can be disabled via the sysbus byte in software. 
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BUS SIZE. This signal allows the 82596CA to work with either 16- or 
32-bit bytes. Inserting BS16 low causes the 82596 to perform two 16- 
bit memory accesses when transferring 32-bit data. In little endian 
mode the D15-D0 lines are driven when BS16 is inserted, in Big 
Endian mode the D31-—D16 lines are driven. 


HOLD. The HOLD signal is active high, the 82596 uses it to request | 
local bus mastership. In normal operation HOLD goes inactive before - 
HLDA. The 82596 can be forced off the bus by deasserting HLDA or if 
the bus throttle timers expire. . 


HOLD ACKNOWLEDGE. The HLDA signal is active high, it indicates 

that bus mastership has been given to the 82596. HLDA is internally 

synchronized; after HOLD is detected low, the CPU drives HLDA low. 
NOTE: 

Do not connect HLDA to Vcc—it will cause a deadlock. A user wanting 

to give the 82596 permanent access to the bus should connect HLDA 

to HOLD. If HLDA goes inactive before HOLD, the 82596 will release 


the bus (by deasserting HOLD) within a maximum of within a specified 


number of bus cycles as specified in the 82596 User’s Manual. 


BUS REQUEST. This signal, when configured to an externally 
activated mode, is used to trigger the bus throttle timers. 


PORT. When this signal is received, the 82596 latches the data on the 
data bus into an internal 32-bit register. When the CPU is asserting this 
signal it can write into the 82596 (via the data bus). This pin must be . 
activated twice during all CPU Port access commands. 


RESET. This active high, internally synchronized signal causes the 
82596 to terminate current activity. The signal must be high for at least 
five system clock cycles. After five system clock cycles and four TxC 
clock cycles the 82596 will execute a Reset when it receives a high 
RESET signal. When RESET returns to low the 82596 waits for the 
first CA signal and then begins the initialization sequence. 


LITTLE ENDIAN/BIG ENDIAN. This dual-function pin is used to 
select byte ordering. When LE/BE is high, little endian byte ordering is 
used; when low, big endian byte ordering is used for data in frames 
(bytes) and for control (SCB, RFD, CBL, etc). 


CHANNEL ATTENTION. The CPU uses this pin to force the 82596 to 
begin executing memory resident Command blocks. The CA signal is 
internally synchronized. The signal must be high for at least one 
system clock. It is latched internally on the high to low edge and then 
detected by the 82596. | 


_ The first CA after a Reset forces the 82596 into the initialization 
sequence beginning at location OOFFFFF6h or an SCP address written 


to the 82596 using CPU Port access. All subsequent CA signals cause 
the 82596 to begin executing new command sequences from the SCB. 


INTERRUPT. A high signal on this pin notifies the CPU that the 82596 
is requesting an interrupt. This signal is an edge triggered interrupt 
signal, and can be configured to be active high or low. 
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PIN DESCRIPTIONS (Continued) 


oo 
Name and Function 


ee ee rs 
rizeins || @rownp.ov.SSSCSCS~—S—SS 


all TRANSMIT DATA. This pin transmits data to the serial link. It is high 


when not transmitting. 


TRANSMIT CLOCK. This signal provides the fundamental timing for 
the serial subsystem. The clock is also used to transmit data 
synchronously on the TxD pin. For NRZ encoding, data is transferred 
to the TxD pin on the high to low clock transition. For Manchester 
encoding, the transmitted bit center is aligned with the low to high 
transition. Transmit clock must neers be running for proper device 
operation. 


~ LOOPBACK. This TTL-level control signal enables the loopback 
mode. In this mode serial data on the TxD input is routed through the 
82C501 internal circuits and back to the RxD output without driving the 

transceiver cable. To enable this signal, both internal and external 

loopback need to be set with the Configure command. 


RECEIVE DATA. This pin receives NRZ serial data only. It must be 
high when not receiving... 


RECEIVE CLOCK. This signal provides timing information to the 
internal shifting logic. For NRZ data the state of the RxD pin is 
sampled on the high to low transition of the clock. 


REQUEST TO SEND. When this signal is low the 82596 informs the 
external interface that it has data to transmit. It is forced high after a 
Reset or when transmission is stopped. 


CLEAR TO SEND. An active-low signal that enables the 82596 to 
send data. It is normally used as an interface handshake to RTS. 
_ Asserting CTS high stops transmission. CTS is internally synchronized. 
lf CTS goes inactive, meeting the the setup time to the TxC negative edge, 
the transmission will stop and RTS will go inactive within, at most, two 
TxC cycles. 


CARRIER SENSE. This signal is active low, it is used to notify the 
82596 that traffic is on the serial link. It is only used if the 82596 is 
configured for external Carrier Sense. In this configuration external | 
circuitry is required for detecting traffic on the serial link. CRS is 
internally synchronized. To be accepted, the signal must remain active 
_ for at least two serial clock cycles (for CRSF = 0). 


COLLISION DETECT. This active-low signal informs the 82596 that a 
collision has occurred. It is only used if the 82596 is configured for 
external Collision Detect. External circuitry is required for collision 
detection. CDT is internally synchronized. To be accepted; the signal 
must remain active for at least two serial clock cycles (for CDTF = 0). 
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The 82596CA and the host CPU communicate 
through shared memory. Because of its on-chip 
DMA capability, the 82596 can make data block 
transfers (buffers and frames) independently of the 
CPU; this greatly reduces the CPU byte transfer 
overhead. 


The 82596 is a multitasking coprocessor that com- 
prises two independent logical units—the Command 
Unit (CU) and the Receive Unit (RU). The CU exe- 
cutes commands from shared memory. The RU han- 
dles all activities related to frame reception. The in- 
dependence of the CU and RU enables the 82596 to 
engage in both activities simultaneously—the CU 
can fetch and execute commands from memory 
while the RU is storing received frames in memory. 
The CPU is only involved with this process after the 
CU has executed a sequence of commands or the 
RU has finished storing a sequence of frames. 


The CPU and the 82596 use the hardware signals 
Interrupt (INT) and Channel Attention (CA) to initiate 
communication with the System Control Block 
(SCB), see Figure 4. The 82596 uses INT to alert the 
CPU of a change in the contents of the SCB, the 
CPU uses CA to alert the 82596. 


The 82596 has a CPU Port Access state that allows 
the CPU to execute certain functions without ac- 
cessing memory. The 82596 PORT pin and data bus 
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pins are used to enable this feature. The CPU can. 


directly activate four operations when the 82596 is in 
this state. 


° Write an alternative System Configuration Pointer 
(SCP). This can be used when the 82596 cannot 
use the default SCP address space. 


Write a different Dump Command Pointer and ex- 
ecute Dump. This can be used for troubleshoot- 
ing No Response problems. 


° The CPU can reset the 82596 via software with- 
out disturbing the rest of the system. 


° A self-test can be used for board testing; the 
82596 will execute a self-test and write the re- 
sults to memory. 
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82596 BUS INTERFACE 


The 82596CA has bus interface timings and pin defi- . 
nitions that are compatible with Intel’s 32-bit 
i486™TMSX and i486™MDX microprocessors. This 
eliminates the need for additional bus interface logic. 
Operating at 33 MHz, the 82596’s bus bandwidth 
can be as high as 106 MB/s. Since Ethernet only 
requires 1.25 MB/s, this leaves a considerable 
amount of bandwidth for the CPU. The 82596 also 
has a bus throttle to regulate its use of the bus. Two. 
timers can be programmed through the SCB: one 
controls the maximum time the 82596 can remain on 
the bus, the other controls the time the 82596 must 
stay off the bus (see Figure 5). The bus throttle can 
be programmed to trigger internally with HLDA or 
externally with BREQ. These timers can restrict the 
82596 HOLD activation time and improve bus utiliza- 
tion. 


82596 NIEMORY ADDRESSING 


The 82596 has a 32-bit memory address range, 
which allows addressing up to four gigabytes of 
memory. The 82596 has three memory addressing 
modes (see Table 1). 


© 82586 WWliode. The 82596 has a 24-bit memory 
address range. The System Control Block, Com- 
mand List, Receive Descriptor List, and Buffer 
Descriptors must reside in one 64-KB memory 
segment. Transmit and Receive buffers can re- 
side in a 24-bit address space. 


32-Bit Segmented Mode. The 82596 has a 32- 
bit memory address range. The System Control 
Block, Command List, Receive Descriptor List, 
and Buffer Descriptors must reside in one 64-KB 
memory segment. Transmit and Receive buffers 
can reside in a 32-bit address space. 


Linear Mode. The 82596 has a 32-bit memory 
address range. Any memory structure can reside 
anywhere within the 32-bit memory address 
range. 
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Figure 4. 82596 and Host CPU Intervention 


82596 Bus Use 
without Bus 
Throttle Timers 


82596 Bus Use 
with Bus Throttle 
Timers z 


t1 =t2+t3 


| Figure 5. Bus Throttle Timers 
Table 1. 82596 Memory Addressing Formats 
| Operation Mode 


Pointer or Offset = - gosg6 | 32-Bit 
Segmented | 


ISCP Address . 24-Bit Linear 


Rx Frame Descriptors 


Tx Frame Descriptors 


24-Bit Linear — 32-Bit Linear 
Tx Buffers | 24-Bit Linear ; 32-Bit Linear 
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Figure 6. 82596 Shared Memory Structure 


82596 SYSTEM MEMORY STRUCTURE 


The Shared Memory structure consists of four parts: 
the Initialization Root, the System Control Block, the 


Command List, and the Receive Frame Area (see | 


Figure 6). 


The Initialization Root is in an established location 
known to the host CPU and the 82596 (OOFFFFF6h). 
However, the CPU can establish the Initialization 
_ Root in another location by using the CPU Port ac- 
cess. This root is accessed during initialization, and 
points to the System Control Block. 
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The System Control Block serves as a bidirectional 
mail drop for the host CPU and the 82596 CU and 
RU. It is the central point through which the CPU and 
the 82596 exchange control and status information. 
The SCB has two areas. The first contains instruc- 
tions from the CPU to the 82596. These include: 
control of the CU and RU (Start, Abort, Suspend, 
and Resume), a pointer to the list of CU commands, 
a pointer to the Receive Frame Area, a set of Inter- 
rupt Acknowledge bits, and the T-ON and T-OFF 
timers for the bus throttle. The second area contains 
status information the 82596 is sending to the CPU. 
Such as, the CU and RU states (Idle, Active 
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Ready, Suspended, No Receive Resources, etc.), in- 
terrupt bits (Command Completed, Frame Received, 
CU Not Ready, and RU Not Ready), and statistical 
counters. 


The Command List functions as a program for the 
‘CU; individual commands are placed in memory 
units called Command Blocks (CBs). These CBs 
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contain the parameters and status of specific high- . 


level commands called Action Commands; e.g., 
Transmit or Configure. 


Transmit causes the 82596 to transmit a frame. The 


Transmit CB contains the destination address, the © 


length field, and a pointer to a list of linked buffers 


holding the frame that is to be constructed from sev- - 


eral buffers scattered throughout memory. The 
Command Unit operates without CPU intervention; 
the DMA for each buffer, and the prefetching of ref- 
erences to new buffers, is performed in parallel. The 
CPU is notified only after a transmission is complete. 


The Receive Frame Area is a list of Free Frame De- 
scriptors (descriptors not yet used) and a list of user- 
prepared buffers. Frames arrive at the 82596 unso- 
licited; the 82596 must always be ready to receive 
and store them in the Free Frame Area. The Re- 
ceive Unit fills the buffers when it receives frames, 
and reformats the Free Buffer List into received- 
frame structures. The frame structure is, for all prac- 
tical purposes, identical to the format of the frame to 
be transmitted. The first. Frame descriptor is refer- 
enced by the SCB. Unless the 82596 is configured 
to Save Bad Frames, the frame descriptor, and the 
associated buffer descriptor, which is wasted when 
a bad frame is received, are automatically reclaimed 
and returned to the Free Buffer List. 


Receive buffer chaining (storing incoming frames in 
a linked buffer list) significantly improves memory 
utilization. Without buffer chaining, the user must al- 
locate consecutive blocks of memory, each capable 
of containing a maximum frame (for Ethernet, 1518 
bytes). Since an average frame is about 200 bytes, 
this is very inefficient. With buffer chaining, the user 
can allocate small buffers and the 82596 will only 
use those that are needed. 


Figure 7 A-D illustrates how the 82596 uses the 
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ing this frame ‘the 82596 sets the next Free Frame 
Descriptor RBD pointer to the next Free RBD. Figure 
7C shows the RFA after receiving a second frame. 
In this example the second frame occupies only one 
Receive Buffer and one RFD. The 82596 again sets 
the RBD pointer. This process is repeated again in 
Figure 7D, showing the reception of another frame 
using one Receive Buffer; in this example there is an 
extra Frame Descriptor. | 


TRANSMIT AND RECEIVE MEMORY 
STRUCTURES 


There are three memory structures for reception and 
transmission. The 82586 memory structure, the 
Flexible memory structure, and the Simplified memo- 
ry structure. The 82586 mode is selected by config- 
uring the 82596 during initialization. In this mode all 
the 82596 memory structures are compatible with 
the 82586 memory structures. 


When the 82596 is not configured to the 82586 
mode, the other two memory structures, Simplified 
and Flexible, are available for transmitting and re- 
ceiving. These structures can be selected on a 
frame-by-frame basis by setting the S/F bit in the 
Transmit Command and the Receive Frame De- 
scriptor (see Figures 29, 30, 41, and 42). The Simpli- 
fied memory structure offers a simple structure for 
ease of programming (see Figure 8). All information 
about a frame is contained in one structure; for ex- 


- ample, during reception the RFD and data field are 
contained in one structure. 


The Flexible memory structure (see Figure 9) has a 
control field that allows the programmer to specify 
the amount of receive data the RFD will contain for 
receive operations and the amount of transmit data 
the Transmit Command Block will contain for trans- 
mit operations. For example, when the control field 
in the RFD is set to 20 bytes during a reception, the 
first 20 bytes of the data field are stored in the RFD 
(6 bytes of destination address, 6 bytes of source 


‘address, 2 bytes of length field, and 6 bytes of data) 


and the remainder of the data field is stored in the 
Receive Data Buffers. This is useful for capturing 
frame headers when header information is con- 


tained in the data field. The header information can 


Receive Frame Area. Figure 7A shows an unused. 


Receive Frame Area composed of Free Frame De- 
scriptors and Free Receive Buffers prepared by the 
user. The SCB points to the first Frame Descriptor of 
the Frame Descriptor List. Figure 7B shows the 
same Receive Frame Area after receiving one 
frame. This first frame occupies two Receive Buffers 
and one Frame Descriptor—a valid received frame 
will only occupy one Frame Descriptor. After receiv- 


then be automatically stored in the RFD partitioned | 


from the Receive Data Buffer. 
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The control field can also be used for the Transmit 
Command when the Flexible memory structure is 
used. The quantity of data field bytes to be transmit- 
ted from the Transmit Command Block is specified 
by the variable control field. 
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Figure 7. Frame Reception in the RFA 
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TO COMMAND LIST 
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FD1 FD2 FD3 FD4 


* STATUS : STATUS STATUS — STATUS 


VARIABLE 


DATA 
FIELD 


<> RECEIVE FRAME LIST eo FREE FRAME LIST cae 
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Figure 8. Simplified Memory Structure 
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Figure 9. Flexible Memory Structure 
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TRANSMITTING FRAMES 


The 82596 executes high-level Action Commands 
from the Command List in system memory. Action 
Commands are fetched and executed in parallel with 
the host CPU operation, thereby significantly improv- 
ing system performance. The format of the Action 
Commands is shown in Figure 10. Figure 28 shows 
the 82586 mode, and Figures 29 and 30 show the 
command formats of the Linear and 32-bit Segment- 
ed modes. | 


A single Transmit command contains, as part of the 
command-specific parameters, the destination ad- 
dress and length field of the transmitted frame and a 
pointer to buffer area in memory containing the data 
portion of the frame. The data field is contained ina 
memory data structure consisting of a buffer de- 
scriptor (BD) and a data buffer—or a linked list of 
buffer descriptors and buffers—as shown in Figure 
11. 


Multiple data buffers can be chained together using 
the BDs. Thus, a frame with a long data field can be 
transmitted using several (shorter) data buffers 
chained together. This chaining technique allows the 
system designer to develop efficient buffer manage- 
ment. : 
The 82596 automatically generates the preamble 
(alternating 1s and Os) and start frame delimiter, 
fetches the destination address and length field from 
the Transmit command, inserts its unique address 
as the source address, fetches the data field speci- 
fied by the Transmit command, and computes and 
appends the CRC to the end of the frame (see Fig- 
ure 12). In the Linear and 32-bit Segmented mode 
the CRC can be optionally inserted on a frame-by- 
frame basis by setting the NC bit in the Transmit 
Command Block (see Figures 29 and 30). 


The 82596 can be configured to generate two types 
of start and end frame delimiters—End of Carrier 
(EOC) or HDLC. In EOC mode the start frame delimi- 
ter is 10101011 and the end frame delimiter is indi- 


START 
~ FRAME 
DELIMITER 


SOURCE 
ADDRESS 


DESTINATION 
ADDRESS 


PREAMBLE 


82596CA 


PRELIMINARY 


cated by the lack of a signal after the last bit of the 
frame check sequence field has been transmitted. In 
EOC mode the 82596 can be configured to extend 
short frames by adding pad bytes (7Eh) during trans- 
mission, according to the length field. In HDLC mode 
the 82596 will generate the 01111110 flag for the - 
start and end frame delimiters, and do standard bit 
stuffing and: stripping. Furthermore, the 82596 can 
be configured to pad frames shorter than the speci- 
fied minimum frame length by appending the appro- 
priate number of flags to the end of the frame. 


When a collision occurs, the 82596 manages the 
jam, random wait, and retry processes, reinitializing 
DMA pointers without CPU intervention. Multiple 
frames can be sent by linking the appropriate num- 
ber of Transmit commands together. This is particu- 
larly useful when transmitting a message larger than 
the maximum frame size (1518 bytes for Ethernet). 


CONTROL | COMMAND STATUS | 
mes [Ccoumand | 


LINK FIELD ; 
(POINTER TO NEXT COMMAND) 


NEXT 
COMMAND 
PARAMETER FIELD 


(COMMAND=SPECIFIC 
PARAMETERS) 
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Figure 10. Action Command Format 


‘TRANSMIT BD 
| ~ LINK FIELD o 


| DB ADDRESS ,| 
| (24 BITS) ~] 


> NEXT BUFFER DESCRIPTOR 
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Figure 11. Data Buffer Descriptor and 
Data Buffer Structure 


~ END 
FRAME 
DELIMITER 
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LENGTH “CHECK 
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Figure 12. Frame Format 
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RECEIVING FRAMES 


To.reduce CPU overhead, the 82596 is designed to 
receive frames without CPU supervision. The host 
-CPU first sets aside an adequate receive buffer 
space and then enables the 82596 Receive Unit. 
Once enabled, the RU watches for arriving frames 
and automatically stores them in the Receive Frame 
Area (RFA). The RFA contains Receive Frame De- 
scriptors, Receive Buffer Descriptors, and Data Buff- 
ers (see Figure 13). The individual Receive Frame 
Descriptors make up a Receive Descriptor List 
(RDL) used by the 82596 to store the destination 
and source addresses, the length field, and the 
status of each frame received (see Figure 14). 


Once. enabled, the 82596 checks each passing 
frame for an address match. The 82596 will recog- 
nize its own unique address, one or more multicast 
addresses, or the broadcast address. If a match is 
found the 82596 stores the destination and source 
addresses and the length field in the next available 
RFD. It then begins filling the next available Data 
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frame. The 82596 will continue to receive frames 
without CPU help as long as Receive Frame De- 
scriptors and Data Buffers are gvailable. 


82596 NETWORK MANAGEMENT 
AND DIAGNOSTICS 


The behavior of data communication networks is 


Buffer on the FBL, which is pointed to by the current — 


RFD, with the data portion of the incoming frame. As 
one Data Buffer is filled, the 82596 automatically 
~ fetches the next DB on the FBL until the entire frame 
is received. This buffer chaining technique is particu- 
larly memory efficient because it allows the system 
designer to set aside. buffers to fit frames much 
shorter than the maximum allowable frame length. If 
AL-LOC = 1, or if the flexible memory structure is 
used, the addresses and length field can be placed 
in the Receive Buffer. 


Once the entire frame is received without error, the 
82596 does the following housekeeping tasks. 


© The actual count field of the last Buffer Descrip- 
tor used to hold the frame just received is updat- 
ed with the number of bytes stored in the associ- 
ated Data Buffer. 


The next available Receive Frame Descriptor is 
fetched. 


The address of the next available Buffer Descrip- 
tor is written to the next available Receive Frame 
Descriptor. . 


A frame received interrupt status bit is posted in 
the SCB. 


2 An interrupt is sent to the CPU. 


- Ifa frame error occurs, for example a CRC error, the 
82596 automatically reinitializes its DMA pointers 
and reclaims any data buffers containing the bad 
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normally very complex because of their distributed 
and asynchronous nature. It is particularly difficult to 
pinpoint a failure when it occurs. The 82596 has ex- 
tensive diagnostic and network management func- 
tions that help improve reliability and testability. The 
82596 reports on the following events after each 
frame is transmitted. © 


e Transmission successful. 
Transmission unsuccessful. Lost Carrier Sense. 
Transmission unsuccessful. Lost Clear to Send. 


Transmission unsuccessful. A DMA underrun oc- 
curred because the system bus did not keep up 
with the transmission. 


Transmission unsuccessful. The number of colli- 
sions exceeded the maximum allowed. 


Number of Collisions. The number of collisions - 
experienced during the frame. 


Heartbeat Indicator. This indicates the presence 
of a heartbeat during the last Interframe Spacing 
(IFS) after transmission. 


When configured to Save Bad Frames the 82596 
checks each incoming frame and reports the follow- 
ing errors. 


e CRC error. Incorrect CRC in a properly aligned | 
frame. 


| Alignment error. Incorrect CRC ina misaligned 
frame. 


Frame too short. The frame is shorter than the 
value configured for minimum frame length. 


Overrun. Part of the frame was not placed in 
memory because the system bus did not keep up 
with incoming data. 


Out of buffer. Part of the frame was discarded 
because of insufficient memory storage space. 


Receive collision. A collision was detected during 
reception. 


Length error. A vaae not matching the frame 
length parameter was detected. 
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Figure 13. Receive Frame Area Diagram 
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Figure 14. Receive Frame Descriptor 
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NETWORK PLANNING AND 
MAINTENANCE 
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To properly plan, operate, and maintain a communi- — 


cation network, the network management entity 
must accumulate information on network behavior. 
The 82596 provides a rich set of network-wide diag- 
nostics that can serve as the basis for -a network 
management entity. 


Information on network activity is provided in the 
status of each frame transmitted. The 82596 reports 
the following activity indicators after each frame. 


PRELIMINARY 


The 82596 will receive all frames and put them in the 
RFD. Frames that exceed the available space in the 
RFD will be truncated, the status will be updated, 
and the 82596 will retrieve the next RFD. This allows 
the user to capture the initial data bytes of each 
frame (for instance, the header) and discard the re- 


wane of the frame. 


The 82596 also has a monitor mode for network 


analysis. During normal operation the receive func- 


© Number of collisions. The number of collisions 


the 82596 experienced while attempting to trans- 


mit the frame. 


© Deferred transmission. During the first transmis- 
sion attempt the 82596 had to defer to traffic on 
the link. 


The 82596 updates its 32-bit statistical counters af- 
ter each received frame that both passes address 
filtering and is longer than the Minimum Frame 
Length configuration parameter. The 82596 reports 
_ the following statistics. 


tion enables the 82596 to receive frames that pass 
address filtering. These frames must have the Start 
of Frame Delimiter (SFD) field and must be longer 
than the absolute minimum frame length of 5 bytes 
(6 bytes in case of Multicast address filtering). Con- 
tents and status of the received frames are trans- 
ferred to memory. The monitor function enables the . 
82596 to simply evaluate the incoming frames. The 


82596 can monitor the frames that pass or do not 


pass the address filtering. It can also monitor frames 


~ which do not have the SFD fields. The 82596 can be 


configured to only keep statistical information about 
monitor frames. Three options are available in the 


‘Monitor mode. These options are selected by the 


e CRC errors. The number of weieiigned frames 


that experienced a CRC error. 


Alignment errors. The number of misaligned 
frames that experienced a CRC error. — 


No resources. The number of frames that were 
discarded because of insufficient. resources for 
reception. 


Overrun errors. The number of frames that were 
not completely stored in memory because the 
system bus did not keep up with incoming data. 


Receive Collision counter. The number of colli- 
sions detected during receive. 


Short Frame counter. The number of frames that 
were discarded because they were shorter than 
the configured minimum frame length. 


The 82596 can be configured to Promiscuous mode. 
In this mode it captures all frames transmitted on the 
network without checking the Destination Address. 
This is useful when implementing a monitoring sta- 
tion to capture all frames for analysis. 


A useful method of capturing frame headers is to 


two monitor mode configuration bits available in the 
configuration command. 


When the first option is selected, the 82596 receives 
good frames that pass address filtering and trans- 


_ fers them to memory while monitoring frames that 


do not pass address filtering or are shorter than the 
minimum frame size (these frames are not trans- 
ferred to memory). When this option is used the 
82596 updates six counters: CRC errors, alignment 
errors, nO resource errors, overrun errors, short 
frames and total good frames received. 


When the second option is selected, the receive 
function is completely disabled. The 82596 monitors 
only those frames that pass address filterings and 
meet the minimum frame length requirement. When 
this option is used the 82596 updates six counters: 


CRC errors, alignment errors, total frames (good and 


‘bad), short frames, collisions detected and total 


good frames. 


When the third option is selected, the receive func- 


use the Simplified memory mode, configure the | 


82596 to Save Bad Frames, and configure the 
82596 to Promiscuous mode with space in the RFD 


tion is completely disabled. The 82596 monitors all 
frames, including frames that do not have a Start 
Frame Delimiter. When this option is used the 82596 
updates six counters: CRC errors, alignment errors, 
total frames (good and bad), short frames, collisions 


. detected and total good frames. 


allocated for specific number of receive data bytes. | 
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STATION DIAGNOSTICS 
AND SELF-TEST 


The 82596 provides a large set of diagnostic and 
network management functions. These include inter- 
nal and external loopback and time domain reflec- 
tometry for locating fault points in the network cable. 


82596CA 


The 82596 ensures software reliability by dumping» 


the contents of the 82596 internal registers into sys- 
tem memory. The 82596 has a self-test mode that 
enables it to run an internal self-test and place the 
results in system memory. 


82586 SOFTWARE COMPATIBILITY 


The 82596 has a software-compatible state in which 
all its memory structures are compatible with the 
82586 memory structure. This includes all the Action 
Commands, the Receive Frame Area (including the 
RFD, Buffer Descriptors, and Data Buffers), the Sys- 
tem Control Block, and the initialization procedures. 


There are two minor differences between the 82596 ~ 


in the 82586-Compatible memory structure and the 
82586. | 


° When the internal and external loopback bits in 
the Configure command are set to 11 the 82596 
is in external loopback and the LPBK pin is acti- 
vated; in the 82586 this situation would produce 
internal loopback. 


During a Dump command both the 82596 and 
82586 dump the same number of bytes; however, 
the data format is different. 
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INITIALIZING THE 82596 


A Reset command is issued to the 82596 to prepare 
it for normal operation. The 82596 is initialized 
through two data structures that are addressed by 
two pointers, the System Configuration Pointer 
(SCP) and the Intermediate System Configuration 
Pointer (ISCP). The initialization procedure begins 
when a Channel Attention signal is asserted after 
RESET. The 82596 uses the address of the double 
word that contains the SCP as a default— 
OOFFFFF4h. Before the CA signal is asserted this 
default address can be changed to any other avail- 
able address by asserting the PORT pin and provid- 
ing the desired address over the D3;—Dz, pins of the 
address bus. Pins D3-Dp must be 0010; i.e., any 
alternative address must be aligned to 16-byte 
boundaries. All addresses sent to the 82596 must be 
word aligned, which means that all pointers and 
memory structures must start on an even address 
(Ao = zero). 


SYSTEM CONFIGURATION POINTER 
(SCP) 


The SCP contains the sysbus byte and the location |" 


of the next structure of the initialization process, the 
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ISCP. The following parameters are selected in the ae 


SYSBUS. 

© The 82596 operation mode. | 
© The Bus Throttle timer triggering method. 
© Lock enabled. | “2 
@ Interrupt polarity. 

© Big Endian 32-bit entity mode. 


Byte ordering is determined by the LE/BE pin. 
LE/BE=1 selects Little Endian byte ordering and 
LE/BE=0 selects Big Endian byte ordering. 


NOTE: | 
In the following, X indicates a bit not checked 
82586 mode. This bit must be set to 0 in all other 
modes. 
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The following diagram illustrates the format of the SCP. 


X X X X X X X X —  * SYSBUS 0000000 0;0 0 0 0 0 0.0 0O/OFFFFF4h 
X X X X X X XIX X X X X X X XIX X X X X X XXX X X X X X X XIOFFFFFEh 


31 ODD WORD EVEN WORD 0 
x 


Ce ee 2) 


23 16 
ows Le] [m[oxpm]m pols 


O- The 32-bit address pointers in Linear mode are ial 
as two 16-bit big endian entities. This is identical to 

' the 82596 A1 stepping definition. 

1- The 32-bit address pointers in Linear mode are treated 
as 32-bit big endian entities. This mode is only supported 
in the 82596 B stepping. In this mode the SCB absolute 
address and statistical counters are still treated as two 
‘16-bit big endian entities. 


L : NOT CHECKED 


0 : 82586 mode 

1 : 32-Bit Segmented mode 
0: Linear mode | ms 
1 : Reserved 


: internal triggering of the | 
Bus Throttle timers 
; external triggering of the 


Interrupt polarity 
0 — Interrupt pin is active 


high . Bus Throttle timers - 
1 - Interrupt pin is active i 
; Lock function enabled 


low 
: Lock function disabled 
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ISCP ADDRESS— The physical address of the ISCP. In the 82586 mode, bits A31—A24 are considered to | 


be zero. 


Figure 15. The System Configuration Pointer 


Writing the Sysbus | 


When writing the sysbus byte it is important to pay attention to the byte order. 


© When a Little Endian processor is used, the sysbus byte is located at byte address OOFFFFF6h (or address 
n+ 2 if an alternative SCP address n was programmed). 


°@ When a processor using Big Endian byte ordering is used, the sysbus, alternative SCP, and ISCP addresses 
will be different. | | 


° The sysbus byte is located at OOFFFFF5h. | 
© If an alternative SCP address is programmed, the sysbus byte should be at byte address n+ 1. 
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INTERMEDIATE SYSTEM CONFIGURATION POINTER (ISCP) 


The ISCP indicates the location of the System Control Block. Often the SCP is in ROM and the.ISCP is in RAM. 
The CPU loads the SCB address (or an equivalent data structure) into the ISCP and asserts CA. This Channel 
Attention signal causes the 82596 to begin its initialization procedure and to get the SCB address from the 
ISCP and SCP. In 82586 and 32-bit Segmented modes the SCP base address is also the base address of all 
Command Blocks, Frame Descriptors, and Buffer Descriptors (but not buffers). All these data structures must 
reside in one 64-KB segment; however, in Linear mode no such limitation is imposed. 


The following diagram illustrates the ISCP format. 


ODD WORD , EVEN WORD 
16 15 -87 | 0 


SCB OFFSET | BUSY ISCP 


SCB BASE ADDRESS AO; ISCP + 4 


X X X X X X X X —in82586 mode 
A24 — in 32-bit segmented mode. 


— Indicates that the 82596 is being initialized. The CPU sets the ISCP to O1h before it gives 
the first CA to the 82596. The ISCP is cleared by the 82596 after the SCB base and offset 
are read. Note that the most Samia Pus of the first word of the ISCP is not modified 
when BUSY is cleared. 


SCB OFFSET— This 16-bit quantity specifies the offset portion of the address of the SCB. 


SCB BASE — Specifies the base portion of the address of the SCB. The base of SCB is also the base of 
all 82596 Command Blocks, Frame Descriptors and Buffer Descriptors. In the 82586 
mode, bits A31—A24 are considered to be zero. 


Figure 16. The Intermediate System Configuration Pointer—82586 and 32-Bit Segmented Modes 


ODD WORD | a EVEN WORD 


SCB ABSOLUTE ADDRESS __ | | — AOJISCP + 4 


BUSY — Indicates that the 82596 is being initialized. The ISCP is set to 01h by the CPU before its 
first CA to the 82596. It is cleared by the 82596 after the SCB address is read. 


SCB ADDRESS— This 32-bit quantity specifies the physical address of the SCB. 


Figure 17. The Intermediate System Configuration Pointer—Linear Mode. 


INITIALIZATION PROCESS 


The CPU sets up the SCP, ISCP, and the SCB structures, and, if desired, an alternative SCP address. It also 
sets BUSY to 01h. The 82596 is initialized when a Channel Attention signal follows a Reset signal, causing the | 
82596 to access the System Configuration Pointer. The sysbus byte, the operational mode, the bus throttle | 
timer triggering method, the interrupt polarity, and the state of LOCK are read. After reset the Bus Throttle 
-timers are essentially disabled—the T-ON value is infinite, the T-OFF value is zero. After the SCP is read, the 
82596 reads the ISCP and saves the SCB address. In 82586 and 32-bit Segmented modes this address is 
represented as a base address plus the offset (this base address is also the base address of all the control 
blocks). In Linear mode the base address is also an absolute address. The 82596 clears BUSY, sets CX and 
‘CNR to equal 1 in the SCB, clears the SCB command word, sends an interrupt to the CPU, and awaits another 
Channel Attention signal. RESET configures the 82596 to its default state before CA is asserted. 
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CONTROLLING THE 82596CA 


The host CPU controls the 82596 with the commands, data structures, and methods described in this section. 
The CPU and the 82596 communicate through shared memory structures. The 82596 contains two indepen- 
dent units: the Command Unit and the Receive Unit. The Command Unit executes commands from the CPU, 
and the Receive Unit handles frame reception. These two units are controlled and monitored by the CPU 
through a shared memory structure called the System Control Block (SCB). The CPU and the 82596 use the 
CA and INT signals to communicate with the SCB. " 3 Wea, oie 


82596 CPU ACCESS INTERFACE (PORT) 


The 82596 has a CPU access interface that allows the host CPU to do four things. 
e Write an alternative System Configuration Pointer address. | 

e Write an alternative Dump area pointer and perform Dump. 

© Execute a software reset. 

@ Execute a self-test. 


The following events initiate the CPU access state. — 

© Presence of an address on the D3;—D4 data bus pins. 

° The D3-Do pins are used to select one of the four functions. 

e The PORT input pin is asserted, as in a regular write cycle. 
ae fas NOTE. | 
_ The SCP Dump and Self-Test addresses must be 16-byte aligned. — 


The 82596 requires two 16-bit write cycles for a port command. The first write holds the internal machines and 
reads the first 16 bits; the second activates the PORT command and reads the second 16 bits. 


The PORT Reset is useful when only the 82596 needs to be reset. The CPU must wait for 10-system and 5-se- 
rial clocks before issuing another CA to the 82596; this new CA begins a new initialization process. 


The Dump function is useful for troubleshooting No Response problems. If the chip is ina No Response state, 
the PORT Dump operation can be executed and a PORT Reset can be used to reinitialize the 82596 without 
disturbing the rest.of the system. | | 


The Self-Test function can be used for board testing; the 82596 will execute a self-test and write the results to 
memory. = a 8 pee S : 


Table 2. PORT Function Selection 


DST cas Pee Cre rere ee er ae 


eoeoeoreoereewr eee eee eee we em ww we we eee ow 


D4 
PE ees 
Test «| AT —~S~CitCare SSCA 
[Csorcrest [Aa SelrTestResuts Adeross Ae [0 
scp | Ast Atematve SoP Adcress A | 0 
[oump [Ast Dumparea Pointer at _| 0 


. 


MEMORY ADDRESSING FORMATS 
The 82596 accesses memory by 32-bit addresses. There are two types of 32-bit addresses: linear and seg- 


mented. The type of address used depends on the 82596 operating mode and the type of memory structure it 
is addressing. The 82596 has three operating modes. - 
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© 82586 Mode 
°o A Linear address is a single 24-bit entity. Address pins A3;—Ao4 are always zero. 
° A Segmented address uses a 24-bit base and a 16-bit offset. 
° 32-bit Segmented Mode 
° A Linear address is a single 32-bit entity. 
° A Segmented address uses a 32-bit base and a 16- bit offset. 


NOTE: 
In the previous two memory secaeeine modes,.each command header (CB, TBD, RFD, RBD, and SCB) 
must wholly reside within one segment. If the 82596 encounters a memory structure that does not follow this 
restriction, the 82596 will fetch the next contiguous location in memory (beyond the segment). 


° Linear Mode , 
° A Linear address is a single 32-bit entity. 
° There are no Segmented addresses. 


Linear addresses are primarily used to address transmit and receive data buffers: In the 82586 and 32-bit 
Segmented modes, segmented addresses (base plus offset) are used for all Command Blocks, Buffer Descrip- 
tors, Frame Descriptors, and System Control Blocks. When using Segmented addresses, only the offset 
portion of the entity being addressed is specified in the block. The base 10K all offsets is the same—ihat of the 
SCB. See Table 1. | 


LITTLE ENDIAN AND BIG ENDIAN BYTE ORDERING 
The 82596 supports both Little Endian and Big Endian byte ordering for its memory siiclies: 


The 82596 A1 stepping supports Big. Endian byte ordering for word and byte entities. Dword entities are not 
supported with 82596 Ai Big Endian byte ordering. This results in slightly different 82596A1 memory struc- 
tures for Big Endian operation. These structures are defined in the 32 LAN Components Users Manual. 


The 82596 B stepping supports Big Endian byte ordering for Linear mode only. All 82596 B 32-bit address 
pointers are treated as 32-bit Big Endian entities, however, the SCB absolute address and statistical counters 
are treated as two 16-bit Big endian entities. This 32- bit Big Endtan enlly support is configured through bit 7 in 
the SYSBUS byte. 


NOTE: . 
All 82596 memory entities must be word or dword aligned, except the transmit buffers can be byte aligned 
for the 82596 B-Stepping. 


An example of a dword entity is a frame descriptor command/status dword, whereas the raw data of the frame 
are byte entities. Both 32- and 16-bit buses are supported. When a 16-bit bus is used with Big Endian memory 
organization, data lines Dj5-—Do are used. The 82596 has an internal crossover that handles these swap 
operations. | 


COMMAND UNIT (CU) | | 
The Command Unit is the logical unit that executes Action Commands from a list of commands very similar to 


a CPU program. A Command Block is associated with each Action Command. The CU is modeled as a logical 
machine that takes, at any given time, one of the following states. _ 


© Idle. The CU is not executing a command and is not associated with a CB on the list. This is the initial state. 
© Suspended. The CU is not executing a command; however, it is associated with a CB on the list. 
© Active. The CU is executing an Action Command and pointing to its CB. 
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The CPU can affect CU operation in two ways: by ee a CU Control! Command or by oe bits in the 
Command word of the Action Command. | 


RECEIVE UNIT (RU) 


The Receive Unit is the logical unit that receives frames and stores them in memory. The RU is modeled as a 
logical machine that takes, at any given time, one of the following states. 


e Idle. The RU has no memory resources and is discarding incoming frames. This is the initial state. 


e No Resources. The RU has no memory resources. and is discarding incoming frames. This state aers 
from.Idle in that the RU accumulates statistics on the number of discarded frames. 


e Suspended. The RU has memory available for storing frames, but is discarding them. The siepand state 
~ can only be reached if the CPU forces this through the SCB or sets the suspend bit in the RFD. . 


e Ready. The RU has memory available and is storing incoming frames. 


The CPU can affect RU operation in three ways: by issuing an RU Control Command, by setting bits in the 
Frame Descriptor Command word of the frame. being received, or by setting the EL bit of the current buffer’ Ss 
Buffer Descriptor. | 


eveten CONTROL BLOCK (SCB) as 

The SCB is a memory block that plays a major role in communications between the CPU and the 82596. Such 
communications include the following. , ; 

¢ Commands issued by the CPU 

e Status reported by the 82596 © 


Control commands are sent to the 82596 by writing them into the. SCB and then asserting CA. The 82596 
examines the command, performs the required action, ene then clears the SCB command ess Control 
commands perform the following types of tasks. — 


e Operation of the Command Unit (CU). The SCB controls the CU by cain the address of the Command 
_ . Block List (CBL) and by starting, suspending, resuming, or aborting execution of CBL commands. 


° Operation of the Bus Throttle. The SCB controls the Bus Throttle timers by providing them with new values 
and sending the Load and Start timer commands. The timers can be operated in both the 32-bit Segmented 
and Linear modes. 


e Reception of frames by the Receive Unit (RU). The SCB controls the RU by specifying the address of the 
' Receive Frame Area and by starting; suspending, resuming, or aborting frame iovervan: 


e Acknowledgment of events that cause interrupts. 
e ° Resetting the chip: 


The 82596 sends status ecere to the CPU | via the Sysicti Control Block. The SCB contains four types a 
status reports. 


e The cause of the current interrupts. These interrupts are caused by one or more of the following 82596 
events. 


¢ The Command Unit completes an Action Command that has its | bit set. 
© The Receive Unit receives a frame. 
¢ The Command Unit becomes inactive. 
¢ The Receive Unit becomes not ready. 
e The status of the Command Unit. 
© The status of the Receive Unit. 7 On 
© Status reports from the 82596 regarding reception of corrupted frames. 
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Events can be cleared only by CPU acknowledgment. If some events are not acknowledged by the ACK field 
the Interrupt signal (INT) will be reissued after Channel Attention (CA) is processed. Furthermore, if a new 
event occurs while an interrupt is set, the interrupt is temporarily cleared to trigger edge-triggered interrupt 
controllers. ) | 


The CPU uses the Channel Attention line to cause the 82596 to examine the SCB. This signal is trailing-edge 
triggered—the 82596 latches CA on the trailing edge. The latch is cleared by the 82596 before the SCB 
control command is read. 


ODD WORD EVEN WORD 0 


31 
TART [x] bub [rw] ROS [x xx xf ‘star’ To] Gus Jo] RS [o 0 0 o]sce 


RFA OFFSET — CBL OFFSET SCB + 4 
ALIGNMENT ERRORS | CRC ERRORS SCB + 8 
OVERRUN ERRORS RESOURCE ERRORS SCB + 12 


Figure 18. SCB—82586 Mode 


ODD WORD EVEN WORD 


[Tag [oT gue [al fud [ooo of ‘star” Jo] G8 | ‘rls’ [Too 0 


*In monitor mode these counters change function 


| E RESOURCE ERRORS (*) 


Figure 19. SCB—32-Bit Segmented Mode 


ODD WORD EVEN WORD 0 


31 
[Tac [oy Gus [rR] Ru [ooo 0] ‘star’ Jo] tus | Tris” |r 


*In MONITOR mode these counters change function 


Figure 20. SCB—Linear Mode 
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Command Word 
31 


| 3 _ 16 ty: 
Pa eT reer Te Tee Te Pe Te Te] seve 
These bits specify the action to be performed a as a result of a CA. This word is set by the CPU an cleared by 
the 82596. Defined bits are: 


Bit31 ACK-CX — Acknowledges that the CU completed an Action Command. 
Bit 30 ACK-FR — Acknowledges that the RU received a frame. 2, 
Bit 29 ACK-CNA — Acknowledges that the Command Unit became not active. 
Bit 28 ACK-RNR — Acknowledges that the Receive Unit became not ready. 
Bits 24-26 CUC  —~( bits) This field contains the command to the Command Unit. Valid values are: 
| 0 —NOP (does not affect current state of the unit). — | yt 
1 — Start execution of the first command on the CBL. If a command is executing, 


complete it before starting the new CBL. The beginning of the CBL is in CBL 
OFFSET (address). 


2 — Resume the operation of the Command Unit by executing the next command. 
‘This operation assumes that the Command Unit has been previously sus- 
| ~ pended. 
3 — Suspend execution of commands on CBL after current command’ is complete. 
4 — Abort current command immediately. 
5 —Loads the Bus Throttle timers so they will be initialized with their new values 


after the active timer (T-ON or T-OFF) reaches Terminal Count. If no timer is 
active new values will be loaded Se ad This command is not valid in 


82586 mode. 
6 —Loads and immediately restarts the Bus Throttle timers with their new values. 
This command is not valid in 82586 mode. — 
7 — Reserved. | 
Bit 23 RESET — Reset chip (logically the same as hardware RESET), 
Bits 20-22 RUC — (3 bits) This field contains the command to the Receive Unit. Valid aise are: 
O —NOP (does not alter current state of unit). ae 3 a 
1 — Start reception of frames. The beginning of the RFA is contained in the RFA 
OFFSET (address). If a frame is ae received complete reception before 
starting. 
2 — Resume frame reception (only when in nevieesnded state). 
3 — Suspend frame reception. lfa frame i is being received complete its raseption 
before suspending. 
4 — Abort receiver operation immediately. 


5-7 — Reserved. 
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vials Word ‘ . 
a 
82586 mode | 


15 


0 
[sia fe] ast [ms Tt Poof] see 


32-Bit Segmented and Linear mode. 


Indicates the status of the 82596. This word is modified only by the 82596. Defined bits.are: 


Bit 15 CX — The CU finished executing a command with its / (interrupt) bit set. 
Bit 14 FR — The RU finished receiving a frame. 
Bit 18 CNA — The Command Unit left the Active state. 
Bit 12 RNR — The Receive Unit left the Ready state. 
Bits 8-10 CUS — (3 bits) This field contains the status of the command unit. Valid values are: 
O  —ldle 
1 — Suspended 
2 —Active 
3-7 — Not used 
Bits 4-7 RUS — This field contains the status of the receive unit. Valid values are: 


Oh (0000) — Idle 
1h (0001) — Suspended 


2h (0010) —No Resources. This bit indicates both no resources. due to lack of e 
RFDs in the RDL and no resources due to lack of RBDs in the FBL. 


4h (0100) — Ready 
Ah (1010) — No resources due to no more RBDs (not in the 82586 mode). 
Ch (1100) — No more RBDs (not in 82586 mode) 
No other combinations are allowed 
BitST - — Bus Throttle timers loaded (not in 82586 mode). 


SCB OFFSET ADDRESSES 


CBL Offset (Address) 


In 82586 and 32-bit Segmented modes this 16-bit quantity indicates the offset portion of the address for the 
first Command Block on the CBL. In Linear mode it is a 32-bit linear address for the first Command Block on 
the CBL. It is accessed only if CUC equals Start. 


RFA Offset (Address) 


In 82586 and 32-bit Segmented modes this 16-bit quantity indicates the offset portion of the address for the 
Receive Frame Area. In Linear mode it is a 32-bit neat address for the Receive Frame Area. It is accessed 
only if RUC equals Start. 
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SCB STATISTICAL COUNTERS 


Statistical Counter Operation _ 


© The CPU is responsible for clearing all error counters before initializing the 82596. The 82596 updates. 
these counters by reading them, adding 1, and then writing them back to the SCB. 


© The counters are wraparound counters. After reaching FFFFFFFFh the counters wrap around to zero. 


© The 82596 updates the required counters for each frame. It is possible for more than one counter to be 
updated; multiple errors will result in all affected counters being updated. 


e The 82596 executes the read-counter/increment/write-counter operation without relinquishing the bus 
(locked operation). This is to ensure that no logical contention exists between the 82596 and the CPU due 
to both attempting to write to the counters simultaneously. In the dual-port memory configuration the CPU 
should not execute any write operation to a counter if LOCK is asserted. 


© The counters are 32-bits wide and their behavior is fully compatible with the IEEE 802.3 standard. The 
82596 supports all relevant statistics (mandatory, optional, and desired) through the status of the transmit 
and receive neaeer and directly through SCB statistics. 


CRCERRS 


This 32-bit quantity contains the number of aligned frames discarded because of a CRC error. This counter is 
updated, if needed, regardless of the RU state. 


ALNERRS 


~ This 32-bit quantity contains the number of frames that both are misaligned (i.e., wher CRS deasserts on a 
nonoctet poundaly) and contain a CRC error. The counter is updated, if needed, regardless of the RU state. 


SHRTFRM 
This 32-bit quantity contains the number of received frames shorter than the minimum frame length. 


The last three counters change function in monitor mode. 


RSCERRS| 


This 32-bit quantity contains the number of good frames discarded because there were no resources to 
contain them. Frames intended for a host whose RU is in the No Receive Resources state, fall into this 
category. This counter is updated only if the RU is in the No Resources state. When in Monitor mode this 
counter counts the total number of frames—good and pad: 3 
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OVRNERRS 


This 32-bit quantity contains the number of frames known to be lost because the local system bus was not 
available. If the traffic problem lasts longer than the duration of one frame, the frames that follow the first are 
lost without an indicator, and they are not counted. This counter is updated, if needed, regardless of the RU 
state. 


RCVCDT 


This 32-bit ee contains the number of collisions detected during frame reception. In Monitor mode this 
counter counts the total number of good frames. 


ACTION COMMANDS AND OPERATING MODES 


This section lists all the Action Commands of the Command Unit Command Block List (CBL). Each command 
contains the Command field, the Status and Control fields, the link to the next Action Command, and any 
command-specific parameters. There are three basic types of action commands: 82596 Configuration and 
Setup, Transmission, and Diagnostics. The following is a list of the actual commands. 


°0 NOP °o Transmit 
° |ndividual Address Setup . . ° TDR 
° Configure ° Dump 


° MC Setup ° Diagnose 


The 82596 has three addressing modes. In the 82586 mode all the Action Commands look exactly like those 
of the 82586. 


© 82586 Mode. The 82596 software and memory structure is compatible with the 82586. 


© 32-Bit Segmented Mode. The 82596 can access the entire system memory and use the two new memory 
_ structures—Simplified and Flexible—while still using the segmented approach. This does not require any 
significant changes to existing software. 


© Linear Mode. The 82596 operates in a flat, linear, 4 ia ha memory space without segmentation. It can 
_ also use the two new memory structures. 


In the 32-bit Segmented mode there are some differences between the 82596 and 82586 action commands, 
mainly in programming and activating new 82596 features. Those bits marked ‘“‘don’t care” in the compatible 
mode are not checked; however, we strongly recommend that those bits all be zeroes; this will allow future 
enchancements and extensions. 


In the Linear mode all of the address offsets become 32-bit address pointers. All new 82596 features are 
accessible in this mode, and all bits previously marked ‘“‘don’t care’ must be zeroes. 


The Action Commands, and all other 82596 memory structures, must pean on even byte boundaries, i.e., they 
must be word aligned. 7 
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NOP > 


This command results in no action by the 82596 except for those performed in the normal command process- 
ing. It is used to manipulate the CBL manipulation. The format of the NOP command is shown in Figure 21. 


NOP—82586 and 32-Bit Segmented Modes 
ODD WORD 16 15 | EVEN WORD 


aT Eee ee ee ae ela oo ee ee STOTT 
KX XXX KKK KK KX KS UNKORFSET 


| NOP—Linear Mode | 
ODD WORD — 16 15 EVEN WORD 


als[i[o ooo oo ooo ves ofcfsfofo oe oo ooee ooo af 
as LINK ADDRESS 


Figure 21 
where: . 7 7 
LINK POINTER — In the 82586 or 32-bit Segmented modes this is a 16-bit offset to the next Command 
| Block. In the Linear mode this is the 32-bit address of the next Command Block. 
EL | — lf set, this bit indicates that this command block is the last on the CBL. 
S _ — If set to one, suspend the CU upon completion of this CB. 


| — If set to one, the 82596 will generate an interrupt after execution of the command is 
| complete. If | is not set to one, the CX bit will not be set. 


CMD (bits 16-18) — The NOP command. Value: Oh. 
Bits 19-28 _ — Reserved (zero i in the 32-bit Segmented and Linear modes: 


C — This bit indicates the execution status of the command. The CPU initially resets it. to zero 
when the Command Block is placed on the CBL. Following a command Soman: the 
82596 will set it to one. 
B — This bit indicates that the 82596 is currently executing the NOP command. It is initially 
reset to zero by the CPU. The 82596 sets it to one when execution begins and to zero 
when execution is completed. This bit is also set when ine 82596 Piictenes the _com- 
mand. | | 
| NOTE: 
The C and B bits are modified in one operation. 


OK _-— Indicates that the command was executed without error. If set to one no error occurred 
(command executed Ok). If zero an error occured. 


"Individual Address Setup 


This command is used to load the 82596 with the Individual Address. This address is used by the 82596 for 
inserting the Source Address during transmission and recognizing the Destination Address during reception. 
After RESET, and prior to Individual Address Setup Command execution, the 82596 assumes the Broadcast 
Address is the Individual Address in all aspects, i.e.: 


e This will be the Individual Address Match reference. 
© This will be the Source Address of a transmitted frame (for AL-LOC=0 mode only). 
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The format of the Individual Address Setup command is shown in Figure 22. 


IA Setup—82586 and 32-Bit Segmented Modes 
ODD WORD 16 15 EVEN WORD 


SMe eeSE STEER OOS RR RMSE 
INDIVIDUAL ADDRESS 1st byte|A15 LINK OFFSET 
6th byte : 5th byte 4th byte 3rd byte 


IA Setup—Linear Mode 
ODD WORD 16 15 EVEN WORD 


4th byte 3rd byte INDIVIDUAL ADDRESS 1st byte 
Se 6th byte sth byte 


Figure 22 

where: | 
LINK ADDRESS, — As per standard Command Block (see the NOP command for details) 
EL,B,C,1,S | a 
A — Indicates that the command was abnormally terminated due to CU Abort control 

command. If one, then the command was aborted, and if necessary it should be 
| repeated. If this bit is zero, the command was not aborted. | 
Bits 19-28 — Reserved (zero in the 32-bit Segmented and Linear modes). 
CMD (bits 16-18) — The Address Setup command. Value: th. | 


INDIVIDUAL ADDRESS — The individual address of the node, 0 to 6 bytes long. 


The least significant bit of the Individual Address must be zero for Ethernet (see the Command Structure). 
However, no enforcement of 0 is provided by the 82596. Thus, an Individual Address with 1 as its least 
“significant bit is a valid Individual Address i in all aspects. 


The default address length is 6 bytes long, as in 802.3. If a different length is used the IA cap command 
should be executed after the Configure command. 


Configure 


The Configure command loads the 82596 with its operating parameters. It allows changing some of the 
parameters by specifying a byte count less than the maximum number of configuration bytes (11 in the 82586 
mode, 14 in the 32-Bit Segmented and Linear modes). The 82596 configuration depends on its mode of 
operation. When configuring the 12th byte (Byte 11 undefined) in 82586 mode this byte should be all ones. 


° In the 82586 mode the maximum number of configuration bytes is 12. Any number larger than 12 will be 
- reduced to 12 and any number less than 4 will be increased to 4. — 


° The additional features of the serial side are disabled in the 82586 mode. 


2° In both the 32-Bit Segmented and Linear modes there are four additional configuration bytes, which hold 


parameters for additional 82596 features. If these parameters are not accessed, the 82596 will Ionow their. 
default values. . 


© For more detailed information refer to the 32-Bit LAN Components User’s Manual. 
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The format of the Configure command is shown in Figure 23, 24 and 25. 


ODD WORD : EVEN WORD 0 


31 
eujs|i|x x x x x x x x x x/o 1 ofc[Bjoxalo oo o 0 oo 0 0 oo fo 
| Bytes | Byte dt LINK OFFSET A 


| Byes =| tee | Byte S| Byte 
| Byes =| Cyto | te | Bytes ft 
Xx XX XX XX XX XX XxX xX xX XIX XX XxX XXX! Byteto ——ite 


Figure 23. CONFIGURE—82586 Mode 


‘ODD WORD | | EVEN WORD 5 Q 


31 | 
EL}s}i1{o 0 0 0 0 0 0 0 Oo ofo 1 ofclBloKj\Alo 0 0 0 0000000 a0 
| Bytes | Byte NS LINK OFFSET __Aola 


Byte 13 Byte 12 Byte 11 Byte 10 


Figure 24. CONFIGURE—32-Bit Segmented Mode 


ODDWORD > | EVEN WORD 0 


31 
eujs|ilo oo oo oo oo ojo + ojciajoKjaAjo oo 0 0 oO 0 OO OO Of 


A3t LINK ADDRESS” ey | A0|4 


Byte 3 


Byte 11 
EK EE KR XK KK EX Byte13 Byte 12 


Ts oz Figure 25. CONFIGURE—Linear Mode 


LINK ADDRESS, — As per standard Command Block (see the NOP command for details) 
EL,B,C,1,S | 


A — Indicates that the command was abnormally terminated due to a CU Abort contro! com- 
mand. If 1, then the command was aborted and if necessary it should be repeated. If this 
bit is 0, the command was not aborted. — 


Bits 19-28 - — Reserved (zero in the 32-Bit Segmented and Linear Modes) 
CMD (bits 16-18) — The CONFIGURE command. Value: 2h. 


The interpretation of the fields follows: 


3 oy — 0 


7 6 5 4 . 
Ce [ « | x [ « | | evegowr "| 
BYTEO | | a. 

BYTE CNT (Bits 0-3) Byte Count. Number of bytes, including this one, that hold pa- 
| rameters to be configured. ae | 
PREFETCHED (Bit 7) Enable the 82596 to write the prefetched bit in all prefetch 


RBDs. 
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NOTE: | 
The P bit is valid only in the new memory structure modes. In 82586 mode this bit is disabled (i.e., no 
prefetched mark). — 


7 y : ‘ 
MONITOR FIFO LIMIT 


io) 


BYTE 1 
FIFO Limit (Bits 0-3) FIFO limit. - | 
MONITOR # (Bits 6-7) Receive monitor options. If the Byte Count of the configure 


command is less than 12 bytes then these Monitor bits are ignored. 
DEFAULT: C8h | | 


BYTE 2 ee 


SAV BF (Bit 7) _ O—Received bad frames are not saved in the memory. 
1—Received bad frames are saved in the memory. 


~ 


DEFAULT: 40h 


RESUME__RD (Bit 1) 0 — The 82596 does not reread the next CB on the list when a CU Resume 
Control Command is issued. 


1 —The 82596 will reread the next CB on the list when a CU Resume 
Control Command is issued. This is available only on the 82596B step- | 


| ping. a | 
7 | a fe 0 
. MODE ADD INS _ 

BYTE 3 | 

ADR LEN (Bits 0-2) Address length (any kind). 

NO SCR ADD INS (Bit3) = No Source Address Insertion. - : 

| In the 82586 this bit is called AL LOC. _ 
PREAM LEN (Bits 4—5) Preamble length. 


LP BCK MODE (Bits 6-7) Loopback mode. 
DEFAULT: 26h 


BOF METD EXPONENTIAL PRIORITY a et LINEAR PRIORITY poe 


BYTE 4 
LIN PRIO (Bits 0-2) Linear Priority. 
EXP PRIO (Bits 4-6) Exponential Priority. 
BOF METD (Bit 7) | Exponential Backoff method. 
DEFAULT: 00h 
7 0 
BYTE 5 7 | 
INTERFRAME SPACING Interframe spacing. 


DEFAULT: 60h 
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SLOT TIME - LOW 


BYTE 6 | | 

SLOT TIME (L) ~ < Slot time, low byte. 

DEFAULT: 00h, — 
7 | : 0 
[_faxiwomreTav numa ———«YiO«|SCSSCSLOT TIE 
BYTE7 Be a 

SLOT TIME (H) Slot time, high part. 

(Bits 0-2) 

RETRY NUM (Bits 4-7) number of transmission retries on collision. 


sprees F2h 


BIT CRC16/ NO GRC TONO MAN/ BC PRM 
STUFF CRC32 INSER CRS , NRZ | DIS MODE | . 


BYTE 8 
PRM (Bit 0) Promiscuous mode. 
BC DIS (Bit1) - -.  _Broadeast disable. | “ 7. 
MANCH/NRZ (Bit 2) ~ Manchester or NRZ encoding. See specific timing require- 
ee +. ments for TXC in Manchester mode. 
TONO CRS (Bit3) ~ Transmit on no CRS. - 
NOCRC INS (Bit 4) No CRC insertion. 
CRC-16/CRC-32 (Bit 5) CRC type. - 
BIT STF (Bit 6) he Bit stuffing. ‘ 
PAD (Bit 7) , - - Padding. 
DEFAULT: 00h 
rae ; 0 
BYTE 9 
CRSF (Bits 0-2) Carrier Sense filter (length). 
CRS SRC (Bit 3) Carrier Sense source. 
CDTF (Bits 4—6) Collision Detect filter (length). 


CDT SRC (Bit 7) -. Collision Detect source. 
DEFAULT: 00h _—_——— 7 
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7 : | a | 0 
MINIMUM FRAME LENGTH | 


BYTE 10 | 
MIN FRAME LEN Minimum frame length. 
DEFAULT: 40h . 


7 0 7 
MONITOR MC__ALL CDBSAC AUTOTX CRCINM | LNGFLD PRECRS 


BYTE 11 


PRECRS (Bit 0) Preamble until Carrier Sense 

LNGFLD (Bit 1) Length field. Enables padding at the End-of-Carrier framing (802.3). 

CRCINM (Bit 2) Rx CRC appended to the frame in memory. 

AUTOTX (Bit 3) Auto retransmit when a collision occurs during the preamble. 

CDBSAC (Bit 4) Collision Detect by source address recognition. 

MC__ALL (Bit 5) | Enable to receive all MC frames. 

MONITOR (Bits 6-7) Receive monitor options. 

DEFAULT: FFH | -— 
7 | | | | 0 
BYTE 12 a 


FDX (Bit 6) Enables Full Duplex operation. 
DEFAULT: 00h } 


vf | - 0 
BYTE 13 | | | — | 
MULT__IA (Bit 6) Multiple individual address. : 
DIS__BOF (Bit 7) Disable the backoff algorithm. 
DEFAULT: 3Fh | 
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A reset (hardware or software) configures the 82596 according to the following defaults. 


Table 4. Configuration Defaults 


Parameter 


ADDRESS LENGTH 

A/L FIELD LOCATION 
AUTO RETRANSMIT 
BITSTUFFING/EOC 
BROADCAST DISABLE 
CDBSAC | 

CDT FILTER 

CDT SRC 

CRC IN MEMORY 
CRC-16/CRC-32 

CRS FILTER. 

CRS SRC 

DISBOF 

EXT LOOPBACK 
EXPONENTIAL PRIORITY ~ 
EXPONENTIAL BACKOFF METHOD 
FULL DUPLEX (FDX) 

FIFO THRESHOLD 


INT LOOPBACK 


INTERFRAME SPACING 
LINEAR PRIORITY 
LENGTH FIELD 

MIN FRAME LENGTH 
MC ALL 

MONITOR 
MANCHESTER/NRZ | 
MULTI IA 

NUMBER OF RETRIES | 
NO CRC INSERTION 
PREFETCH BIT IN RBD 
PREAMBLE LENGTH 
Preamble Until CRS 
PROMISCUOUS MODE 
PADDING 

SLOT TIME 

SAVE BAD FRAME 
TRANSMIT ON NO CRS 


NOTES: 
. 1. This configuration setup is compatible with the IEEE € 802.3 specification. 

2. The Asterisk “*” signifies a new configuration parameter not available in the 82586. 

3. The default value of the Auto retransmit configuration parameter is enabled(). 

4. Double Asterisk ‘**” signifies IEEE 802.3 requirements. 


_ Default Value 


* 
* 
oO Oo 


* 
* 


* 
* 


OCOODOD0OCOOO=Ac0cO=00 = 


* 
* 
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Units/Meaning 


Bytes 

Located in FD 

Auto Retransmit Enable 

EOC 

Broadcast Reception Enabled | 
Disabled 

Bit Times 

External Collision Detection - 
CRC Not Transferred to memo 
CRC-32 

0 Bit Times 


External CRS 


Backoff Enabled 
Disabled 


-. 802.3 Algorithm 


802.3 Algorithm 

CSMA/CD Protocol (No FDX) 
TX: 32 Bytes, RX: 64 Bytes 
Disabled 

Bit Times 

802.3 Algorithm 

Padding Disabled 

Bytes 

Disabled 

Disabled 

NRZ 

Disabled 

Maximum Number of Retries 


‘CRC Appended to Frame 


Disabled (Valid Only in New Modes) 
Bytes | 

Disabled 

Address Filter On 

No Padding 

BitTimes 

Discards Bad Frames 

Disabled | 
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Multicast-Setup 


_ This command is used to load the 82596 with the Multicast-IDs that should be accepted. As noted previously, 
the filtering done on the Multicast-IDs is not perfect and some unwanted frames may be accepted. This 
command resets the current filter and reloads it with the specified Multicast-IDs. The format of the Multicast- 
addresses setup command is: 


ODD WORD | EVEN WORD 


etsy exw x xxx x xls t felalada[eo ooo ooo ee oe 
x[xf Mccount dS LINKOFFSET_ A 


ath byte | 


MULTICAST ADDRESSES LIST 
Nth byte 


Figure 26. MC Setup—82586 and 32-Bit Segmented Modes 


ODD WORD EVEN WORD 


elsfife ooo co eve oles else oveooeaoee 


ond byte 1st byte MC COUNT 


MULTICAST ADDRESSES LIST 


Nth byte 


Figure 27. MC Setup—Linear Mode 


where: 

LINK ADDRESS, — As per standard Command Block (see the NOP command for details) 

FL.BC1S | a fF 

A — Indicates that the command was abnormally terminated due to a CU Abort control. 

; command. If one, then the command was aborted and if necessary it should be 
repeated. If this bit is zero, the command was not aborted. 

Bits19-28 — Reserved (0 in both the 32-Bit Segmented and Linear Modes). 

CMD (bits 16-18) — The MC SETUP command value: 3h. 

MC-CNT | This 14-bit field indicates the number of bytes in the MC LIST field. The MC CNT 


must be a multiple of the ADDR LEN; otherwise, the 82596 reduces the MC CNT to 
the nearest ADDR LEN multiple. MC CNT=0O implies resetting the Hash table 
which is equivalent to disabling the Multicast filtering mechanism. 


MC LIST — A list of Multicast Addresses to be accepted by the 82596. The least significant bit 
of each MC address must be 1. 


NOTE: 
The list is sequential: i.e., the most signiileant byte of an. address is immediately followed by the least signifi- 
cant byte of the next address. 


— When the 82596 is Seenaure to recognize multiple Individual Address (Multi-IA), 
the MC-Setup command is also used to set up the Hash table for the individual 
| address. 
The least significant bit in the first byte of each IA address must be 0. 
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Transmit 


This command is used to transmit a frame of user Gate onto the serial link. The format of a Transmit command 
is as follows: 


ODD WORD | EVEN WORD 


als[ie xxx ee xT 8 8 e[a[ ass | wo — 


A15 “TBD OFFSET AOIA15” LINK OFFSET Ao|4- 
nbs _DESTINATION ADDRESS ist byte|8 
LENGTH FIELD 6th byte 


Figure 28. TRANSMIT—82586 Mode 


ODD WORD | EVEN WORD 


es Tol olo[ololo[o[oludsr|s oo] c[ | statuses | wacon 
Ais TBDOFFSET AAS, = LINKOFFSET AA 
poo ceo 70 08 000 6 olforo | ————__ waco 
12 


_ Figure 2S. TRANSMIT—32-Bit Segmenied Node 


ODD WORD | | EVEN WORD 


ela[iToloTo[o{olo9[oRa@ = 2] [al — sass | — woes — 
ast LINKADDRESS 
asi TRANSMITBUFFERDESCRIPTORADDRESS AB 
poe oepeeoet oo vem] —TBDAT 
—_trevie_{DESTINATION ADDRESS 1st byte|16— 


Fgura: 30. TRANSMIT—Linear Mode 


COMMAND WORD 


STeTeoToTs|elslelel shoes oe 


T T 
0: No CRC Insertion disable; when the 0: Simplified Mode, all the Tx data is in 
configure command is configured to the Transmit Command Block. The 
not insert the CRC during Transmit Buffer Descriptor Address 


transmission the NC bit has no field is all 1s. 

effect. : : Flexible Mode. Data is in the TCB and 
: No CRC Insertion enable; when the in a linked list of TBDs. 

configure command is configured to 

insert the CRC during transmission 

the CRC will not be inserted when 

NC = 1. 
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where: 

EL, B,C, 1,S — As per standard Command Block (see the NOP command for details). 

OK (Bit 13) — Error free completion. 

A (Bit 12) — Indicates that the command was abnormally terminated due to CU Abort control 
command. If 1, then the command was aborted, and if necessary it should be 
repeated. If this bit is 0, the command was not aborted. 

Bits 19-28 — Reserved (0 in the 32-bit Segmented and Linear modes). 

CMD (Bits 16-18) — The transmit command: 4h. 

Status Bit 11 — Late collision. A late collision (a collision after the slot time is elapsed) is detected. 

Status Bit 10 — No Carrier Sense signal during transmission. Carrier Sense signal is monitored 
from the end of Preamble transmission until the end of the Frame Check Sequence 
for TONOCRS = 1 (Transmit On No Carrier Sense mode) it indicates that transmis- 
sion has been executed despite a lack of CRS. For TONOCRS=0 (Ethernet 
mode), this bit also indicates unsuccessful transmission (transmission stopped 
when lack of Carrier Sense has been detected). 

Status Bit 9 — Transmission unsuccessful (stopped) due to Loss of CTS. 

Status Bit 8 — Transmission unsuccessful (stopped) due to DMA Underrun; i.e., the system did 
not supply data for transmission. 

Status Bit 7 — Transmission Deferred, i.e., transmission was not immediate due to previous link 
activity. 

Status Bit 6 — Heartbeat Indicator, Indicates that after a previously performed transmission, and 
before the most recently performed transmission, (interframe Spacing) the CDT 
signal was monitored as active. This indicates that the Ethernet Transceiver Colli- 
sion Detect logic is performing Prepely: The Heartbeat is monitored during the | 

| Interframe Spacing period. 

Status Bit 5 — Transmission attempt was stopped because the number of collisions exceeded the 
maximum allowable number of retries. 

Status Bit 4 — 0 (Reserved). 

MAX-COL — The number of Collisions experienced during this frame. Max Col = 0 plus S5 = 1 

(Bits 3-0) indicates 16 collisions. 

LINK OFFSET — As per standard Command Block (see the NOP Command for details) 

TBD POINTER — In the 82586 and 32-bit Segmented modes this is the offset of the first Tx Buffer 


Descriptor containing the data to be transmitted. In the Linear mode this is the 32- 
bit address of the first Tx Buffer Descriptor on the list. If the TBD POINTER is all 1s 
it indicates that no TBD is used. 
DEST ADDRESS — Contains the Destination Address of the frame. The least significant bit (MC) indi- 
| cates the address type. 


MC = 0: Individual Address. 
MC = 1: Multicast or Broadcast Address. 
If the Destination Address bits are all 1s this is a Broadcast Address. 

LENGTH FIELD — The contents of this 2-byte field are user defined. In 802.3 it contains the length of 
the data field. It is placed in memory in the same order it is transmitted; i.e., most 
significant byte first, least significant byte second. | 

TCBCOUNT == = = — This 14-bit counter indicates the number of bytes that will be transmitted from the 
Transmit Command Block, starting from the third byte after the TCB COUNT field 
(address n+ 12 in the 32-bit Segmented mode, N+ 16 in the Linear mode). The 
TCB COUNT field can be any number of bytes (including an odd byte), this allows 
the user to transmit a frame with a header having an odd number of bytes. The 
TCB COUNT field is not used in the 82586 mode. 


EOF Bit — Indicates that the whole frame is kept in the Transmit. Command Block. In the 
Simplified memory model it must be always asserted. 
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The interpretation of what is transmitted depends on the No Source Address insertion configuration bit and the 
memory model being used. . | | 


NOTES: 

1. The Destination Address and the Length Field are soauennal The Length Field immediately follows the 
‘most significant byte of the Destination Address. 

2.In case the 82596 is configured with No Source Address insertion bit equal to 0, the 82596 inserts its 
configured Source Address in the transmitted frame. — 


e@ In the 82586 mode, or when the Simplified memory model is used, the Destination and Length fields of the 
~ transmitted frame are taken from the Transmit Command Block. 


e If the FLEXIBLE memory model is used, the Destination and Length fields of the transmitted frame can be 
found either inthe TCB or TBD, depending on the TCB COUNT. 


3. If the 82596 is configured with the Address/ Length Field Location equal to 1, the 82596 does not insert its 
configured Source Address in the transmitted frame. The first (2 x Address Length) + 2 bytes of the 
transmitted frame are interpreted as Destination Address, Source Address, and Length fields respectively. 
The location of the first transmitted byte depends on the operational mode of the 82596: 


e In the 82586 mode; it is always the first byte of the first Tx Buffer. 
© In both the 32-bit Segmented and Linear modes it depends on the SF bit and TCB COUNT: 


— In the Simplified memory mode the first transmitted byte | is always the third byte after the TCB COUNT 
field. 
— Inthe Flexible mode, if the TCB COUNT is greater than 0 then it is the third byte after the TCB COUNT 
__. field. If TCB COUNT equals 0 then it is first byte of the first Tx Buffer. 
e ‘Transmit frames shorter than six nee are mene: The transmission will be aborted oa in 82586 mode) 
_ because of a DMA Underrun. . } 


. 4. Frames which are aborted during anemission are jammed. Such an aiemupeon of transmission can be 


caused by any reason indicated by any of the status bits 8, 9, 10 and 12. 


Jamming Rules 
1. Jamming will not start Raters completion of preamble transmission. 


2. Collisions detected during transmission of the last 11- bits will not result in jamming. 


The format of a Transmit Buffer Descriptor is: 


82586 Mode > 
ODD WORD | 1615 13 | EVEN WORD 


NEXT TBD OFFSET EOF| x | SIZE (ACT COUNT) 0 
X X X X XX X X TRANSMIT BUFFER ADDRESS _ 4 


24 . 32-Bit Segmented Mode 
ODD WORD 16 15 13 EVEN. WORD 


-NEXTTBDOFFSET EOF| 0 |. SIZE (ACT COUNT) 0 
4 


TRANSMIT BUFFER ADDRESS 


‘Linear Mode 
ODD WORD . 16 15— 13° EVEN ‘WORD 


Doe woe Dolo oo ev oe olor ‘SIZE (ACT COUNT) 


g. 13g NEXT TBD ADDRESS Rtn 4 
sak _.. TRANSMIT BUFFER ADDRESS | etl 8 


Figure 31 
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where: | | 
EOF | — This bit indicates that this TBD is the last one associated with the frame being 
| transmitted. It is set by the CPU before transmit. | 

SIZE (ACT COUNT) — This 14-bit quantity specifies the number of bytes that hold information for the 


current buffer. It is set by the CPU before transmission. 


NEXT TBD ADDRESS — In the 82586 and 32-bit Segmented modes, it is the offset of the next TBD on the 
list. In the Linear mode this is the 32-bit address of the next TBD on the list. It is 
meaningless if EOF= 1. 7 


BUFFER ADDRESS -— The starting address of the memory area that contains the data to be sent. In the 
82586 mode, this is a 24-bit address (A31—A24 are considered to be zero). In the 
32-bit Segmented and Linear modes this is a 32-bit address. This buffer can be 
byte aligned for the 82596 B step. | 


TDR 


This operation activates Time Domain Reflectomet, which is a mechanism to detect open or short circuits on 
the link and their distance from the diagnosing station. The TDR command has no parameters. The TDR 
transmit sequence was changed, compared to the 82586, to form a regular transmission. The TDR bit stream 
is as follows. 


— Preamble 
— Source address _ 


— Another Source address (the TDR frame is transmitted back to the sending station, 
so DEST ADR = SRC ADR). | 


— Data field containing 7Eh patterns. 
— Jam Pattern, which is the inverse CRC of the transmitted frame. 


Maximum length of the TDR frame is 2048 bits. If the 82596 senses collision while transmitting the TDR frame 
it transmits the jam pattern and stops the transmission. The 82596 then triggers an internal timer (STC); the 
timer is reset at the beginning of transmission and reset if CRS is returned. The timer measures the time 
elapsed from the start of transmission until an echo is returned. The echo is indicated by Collision Detect going 
active or a drop in the Carrier Sense signal. The following table lists the possible cases that the 82596 is able 
to analyze. _ 


Conditions of TDR as Interpreted by the 82596 


as dnanscenver Iype Ethernet | Non Ethernet _ 
Condition 
Short or Open on the NA | 
Transceiver Cable | 
Short on the Ethernet cable 
Collision Detect went active Open on the Ethernet cable | Open on the Serial Link 


The Carrier Sense Signal did not drop or the No Problem No Problem | 
Collision Detect did not go active within 
2048-bit time period 3 | 


An Ethernet transceiver is defined as one that returns transmitted data on the receive pair and activates the 
Carrier Sense Signal while transmitting. A Non-Ethernet Transceiver is defined as one that does not do so. 
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The format of the Time Domain Reflectometer command is: 


82586 and 32-Bit oe Modes 
ODD WORD 16 15 EVEN WORD 


e[s[i[x xxx sexx iclbleseesscoeeere 


LNK|XVR} ET | ET | X TIME A15 _LINK OFFSET 
eve RB|OPN|SRT (11 bits) 


: Linear Mode 
31 ODD WORD a, ~ 1615 Even WORD 


a) 0 


0000000000 00 0 0 0 0; LNK |] XVR] ET al TIME 
OK | PRB | OPN} SRT (11 bits) 


Figure 32. TDR 


where: 

LINK ADDRESS, — As per standard Command Block (see the NOP command for details). 

EL,B,C,1,S . | . 

A — Indicates that the command was abnormally terminated due to CU Abort control 
command. If one, then the command was aborted, and if necessary it should be 
repeated. If this bit is zero, the command was not aborted. 

Bits 19-28 — Reserved (0 in the 32-bit Segmented and Linear Mode) 


CMD (Bits 16-18) 
TIME 


LNK OK (Bit 15) 
XCVR PRB (Bit 14) 


ET OPN (Bit 13) 


ET SRT (Bit 12) 


— The TDR command. Value: 5h. - 


— An 11-bit field that specifies the number of TxC cycles that elapsed before an echo 
_ was observed. No echo is indicated by a reception consisting of ‘‘1s” only. Be- 
cause the network contains various elements such as transceiver links, transceiv- 
ers, Ethernet, repeaters etc., the TIME is not exactly proportional to the problems 


distance. ~ 


— No link problem identified. TIME = 7EFh. 


— Indicates a Transceiver problem. Carrier Sense was inactive for 2048-bit time peri- 
od. LNK OK=0. TIME = 7FFh. 


— The transmission line is not properly terminated. Collision Detect went active and 


LNK OK=0. 


LNK OK=0.. 


-— There is a short circuit on the transmission line. Carrier Sense Signal es and. 
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DUMP 


This command causes the contents of various 82596 registers to be placed in a memory. area specified by the 


user. It is supplied as a 82596 self-diagnostic tool, and to provide registers of interest to the user. The format 
of the DUMP command is: 


82586 and 32-Bit Seqmented Modes 
ODD WORD 16 15 ‘EVEN WORD 0 


31 
leL[s|i]x x x x x x x x x x{1 1 ofcl{BloKlo 0 00000000000 
A15 BUFFER OFFSET AO|A15 LINKOFFSET AO 


Linear Mode 2 
ODD WORD 16 15 EVEN WORD . 0 


31 

eL|s}1}x x x x x x x x x x{1 14 ofc{BloK}o 0 00000000000 
A31 LINK ADDRESS AO 
A3t BUFFER ADDRESS | AO 


Figure 33. Dump 


where: , : 

LINK ADDRESS, — As per standard Command Block (see the NOP command for details). 

EL, B,C,1,$S | 

OK _ — Indicates error free completion. 

Bits 19-28 — Reserved (0 in the 32-bit Segmented and Linear Modes). 

CMD (Bits 16-18) — The Dump command. Value: 6h. 

BUFFER POINTER — In the 82586 and 32-bit Segmented modes this is the 16-bit-offset portion of the 

| dump area address. In the Linear mode this is the 32-bit linear address of the dump 
area. 


Dump Area Information Format 


© The 82596 is not Dump compatible with the 82586 because of the 32-bit internal architecture. In 82586 
mode the 82596 will dump the same number of bytes as the 82586. The compatible data will be marked 
with an asterisk. 


© In 82586 mode the dump area is 170 bytes. 
°o The DUMP area format of the 32-bit Segmented and Linear modes is described in Figure 35. 
@ The size of the dump area of the 32-bit Segmented and Linear modes is 304 bytes. 


@ When the Dump is executed by the Port command an extra word will be appended to the Dump Area. The 
extra word is a copy of the Dump Area status word (containing the C, B, and OK Bits). The C and OK Bits 
are set when the 82596 has completed the Port Dump command. oO 
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15 14 13 12 1110 9 8 7 6 5 4 3 2 1 =«0 
DMA CONTROL REGISTER 00 


CONFIGURE BYTES? 3, 2 02 |  *The 82596 is not Dump compatible with’ 
' CONFIGURE BYTES* 5, 4 | 04 7 the 82586 because of the 32-bit internal ar- 
CONFIGURE BYTES* 7, 6 06 chitecture. In 82586 mode the 82596 will «| — 


dump the same number of bytes as the 7 
CONFIGURE BYTES* 9, 8 08 ~ 82586. | 


CONFIGURE BYTES* 10 OA _ **These bytes are not user defined, results 
A. BYTES 1.0* ° 10 '" may vary from Dump command to Dump 
: command.. 
1A. BYTES 3, 2* ay oi OE 
1A. BYTES 5, 4* 110 
LAST T.X. STATUS* att 12 > 3 | 
___T.X. CRC BYTES 1, 0* | 14 . ; 
T.X. CRC BYTES 3, 2* 16 
R.X. CRC BYTES 1, 0* 18 
R:X. CRC BYTES 3, 2* - 1A 
R.X. TEMP MEMORY 1, 0* 1C 
R.X. TEMP MEMORY 3, 2* 1E 
R.X. TEMP MEMORY 5, 4* 20 
LAST RECEIVED STATUS* 22 
HASH REGISTER BYTES 1, 0* ; 24 
HASH REGISTER BYTES 3, 2* {26 
HASH REGISTER BYTES 5, 4* 28 
2A 
| SLOT TIME COUNTER* lien 
WAIT TIME COUNTER* 2E 
MICRO MACHINE** 30 


REGISTER FILE 


60 BYTES 6A 
MICRO MACHINE LFSR** : 6C 
“MICRO MACHINE** \6E 


FLAG ARRAY 


14 BYTES 7A 
QUEUE MEMORY** 7C 


_ CUPORT ae 
-8BYTES 82 
MICRO MACHINE ALU** 84 
RESERVED** 86 
M.M. TEMP A ROTATE R** 88 
MLM. TEMP A** BA 
T.X. DMA BYTE COUNT** : 8C 
__M.M. INPUT PORT ADDRESS** , 8E 
= T.X: DMA ADDRESS | 90 
M.M. OUTPUT PORT** s-2 & 3/92 
R.X. DMA BYTE COUNT** 94 
M.M. OUTPUT PORT ADDRESS REGISTER** 96 
R. DMA ADDRESS** 198 
RESERVED** 9A 
BUS THROTTLE TIMERS 9C 
DIU CONTROL REGISTER** 9E 
RESERVED** 7 AO 
DMA CONTROL REGISTER** A2 
BIU CONTROL REGISTER** Ad 
M.M. DISPATCHER REG.** | Ae 
M.M. STATUS REGISTER** AB 


Figure 34. Dump Area Format—82586 Mode 
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31 0 
, 82586 because of the 32-bit internal archi- 
tecture. In 82586 mode the 82596 will dump 
**These bytes are not user defined, results 
X may vary from Dump command to Dump 


LA. BYTES 5, 2 command. | 


SLOT TIME COUNTER 
RECEIVE FRAME LENGTH 


MICRO MACHINE** 


REGISTER FILE 


128 BYTES 
MICRO MACHINE LFSR** 
MICRO MACHINE** 


FLAG ARRAY 


28 BYTES 


M.M. INPUT PORT** 
16 BYTES 


M.M. DISPATCHER REG.** . 


M.M. STATUS REGISTER ** 


Figure 35. Dump Area Format—Linear and 32-Bit Segmented Mode 
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Diagnose 

The Diagnose Command ee an internal seli-test procedure that checks internal 82596 hardware, which 
includes: 

e Exponential Backoff Random Number Generator neat Feedback Shift Register). 

e Exponential Backoff Timeout Counter. 

¢ Slot Time Period Counter. 

¢ Collision Number Counter. 

© Exponential Backoff Shift Register. 

e Exponential Backoff Mask Logic. 

e Timer Trigger Logic. 


This procedure checks the operation of the Backoff block, which aces in the serial side and is not easily 
controlled. The Diagnose command is performed in two pee . 


The format of the 82596 Diagnose command is: 


82586 and 32-Bit Segmented Modes 
ODD WORD 16 15 EVEN WORD 0 


ejspiexuxxxxxexlis sfoleloelfoooocooo ooo 


X X X X X X X X X X X X X X X XIAIS LINK OFFSET AO 


Linear Mode 
ODD WORD 16 15 ~ EVEN WORD 
e[s[ie aoe oe ova ofr ile[slodelr[e ovo ea ea seo 
Figure 36. Diagnose 

where: | . | : | , 
LINK ADDRESS, — As per standard Command Block (see the NOP command for details). 
EL,B,C,1,S _ 
Bits 19-28 — Reserved (0 in the 32-bit Segmented and Linear Modes). 
CMD (bits 16-18) — The Diagnose command. Value: 7h. 
OK (bit 13) — Indicates error free completion. _ | 
F (bit 11) — Indicates that the self-test procedure has failed. 
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RECEIVE FRAME DESCRIPTOR 


Each received frame is described by one Receive Frame Descriptor (see Figure 37). Two new memory 
structures are available for the received frames. The structures are available only in the Linear and 32-bit 
Segmented modes. | 


Simplified Memory Structure 


The first is the Simplified memory structure, the data section of the received frame is part of the RFD and is 
located immediately after the Length Field. Receive Buffer Descriptors are not used with the Simplified struc- 
ture, it is primarily used to make programming easier. If the length of the data area peocuved in the. Size Field 
is smaller than the incoming frame, the following neappele: | ? 


1. The received frame is truncated. 
2. The No Resource error counter is updated. 


3. If the 82596 is configured to Save Bad Frames the RFD is not reused; otherwise, the same RFD is used to 
hold the next received frame, and the only action taken regarding the truncated frame is to update the 
counter. 


4. The 82596 continues to receive the next frame in the next RFD. 


$$ _$_$__—_|__ — — RECEIVE FRAME AREA <> 


re 50 | 
RFA STATUS | | STATUS >| STATUS >} STATUS 
POINTER : 


STATISTICS —_ 
i VALID EMPTY 
COMMAND 
BLOCK BEALE PARAMETERS | ey a 


LIST DESCRIPTORS 


: 0 ACT=cnt 


RECEIVE 
BUFFERS 


BUFFER 
DESCRIPTORS 


BUFFER 1 BUFFER 2 BUFFER 3 BUFFER 4 BUFFER 5 
RECEIVE FRAME LIST Di< FREE FRAME LIST 
290218-15 


Figure 37. The Receive Frame Area 
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Note that this sequence is very useful for monitoring. If the 82596 is configured to Save Bad Frames, to 
receive in Promiscuous mode, and to use the Simplified memory structure, any programme? length of received 
data can be saved in memory. a 


The Simplified memory structure is shown in Figure 38. 


TO COMMAND LIST 


en ee a ee RECEIVE FRAME AREA a ee . 
a | r) 1 
] i an | 
I FD1 ] FD2 i] 
I } I 


STATUS STATUS STATUS _ STATUS 


; BUS : 
; THROTTLE , 
a 


RECEIVE 
FRAME 
DESCRIPTORS 


VARIABLE 
DATA 
FIELD 


— RECEIVE ERAME LIST a) al FREE FRAME LIST —— oy ert | 


290218-16 


Figure 38. RFA Simplified Memory Structure 


Flexible Memory Structure 

The second structure is the Flexible memory structure, the data structure of the received frame is stored in 
both the RFD and in a linked list of Receive Buffers—Receive Buffer Descriptors. The received frame is pieced 
in the RFD as configured in the Size field. Any herald data | is gee in a linked list of RBDs. 


Tne Flexible memory structure is shown i in Figure 39. 
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> TO COMMAND LIST 
¢ee OO RECEIVE FRAME AREA 


FD1 
RFA | 
POINTER pueh ihe 
= 


t 
! 
1 FD3 FD4 
i} "1 ; 

STATUS STATUS 
Te Pe 


7 

oO 

nN 
O O 


| CONTROL 
bs tenes FIELD 
RECEIVE VARIABLE 


FRAME DATA 
DESCRIPTORS 


V 
1 I 
| I 
oe | i 
i] t 
i] i 
i] t 
] L} 
| ! 
* a 


RECEIVE 
BUFFER 
DESCRIPTORS 


RECEIVE 
BUFFERS 


BUFFER 1 BUFFER 2 : ' BUFFER 3 BUFFER 4 BUFFER 5 | 


id. RECEIVE FRAME LIST ——————>!¢ —_______________ FREE FRAME LIST ————-———_ 
? "3 ~ 990218-17. 


Figure 39. RFA Flexible Memory Structure 


Buffers on the receive side can be different lengths. The 82596 will not place more bytes into a buffer than 
indicated in the associated RBD. The 82596 will fetch. the next RBD before it is needed. The 82596 will 
attempt to receive frames as long as the FBL is not exhausted. If there are no more buffers, the 82596 
Receive Unit will enter the No Resources state. Before starting the RU, the CPU must place the FBL pointer in 
the RBD pointer field of the first RFD. All remaining RBD pointer fields for subsequent RFDs should be “‘is.”’ If 
the Receive Frame Descriptor and the associated Receive Buffers are not reused (e.g., the frame is properly 
received or the 82596 is configured to Save Bad Frames), the 82596 writes the address of the next free RBD 
to the RBD pointer field of the next RFD. | 


Receive Buffer Descriptor (RBD) 


The RBDs are used to store received data in a flexible set of linked buffers. The portion of the frame’s data 
field that is outside the RFD is placed in a set of buffers chained by a sequence of RBDs. The RFD points to 
the first RBD, and the last RBD is flagged with an EOF bit set to 1. Each buffer in the linked list of buffers 
related to a particular frame can be any size up to 214 bytes but must be word aligned (begin on an even 
numbered byte). This ensures optimum use of the memory resources while maintaining low overhead. All 
buffers in a frame are filled with the received data except for the last, in which the actual count can be smaller 
than the allocated buffer space. 
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ODD WORD EVEN WORD 


31 0 
eL|[s|x x x x x x x x x x x x x x[c|{Bloklo| statussits [o 0 0 0 0 ofo 
A15 RBD OFFSET AO|A15 LINKOFFSET 

4th byte | DESTINATION ADDRESS 1st byte|8 


SOURCE ADDRESS 1st byte]6th byte | 
6thbyte sy 4th byte _ | i 
XX xX X XX KX KX XX KX KX Kk xX xX ~ LENGTH FIELD : 


Figure 40. Receive Frame Descriptor—82586 Mode 


ODD WORD . 16 15 EVEN WORD 0 


31 

eijsjo oo 0 0 0 0 0 0 ofsFio o o| C{BloK) ss STATUSBITS, 
Ais RBDOFFSET AGIAN = LINKOFFSET AOA 
ojof SE OFF] ACTUALCOUNT 


1st byte/6th byte 
4th byte | : | 
| LENGTH FIELD | | 


| _ OPTIONAL DATA AREA 


ODD WORD | EVEN WORD 0 


31 
eL|s}o 0 0000000 olsFlo o o/ciBioK, _—sSTATUSBITS, 
asi SRECEIVEBUFFERDESCRIPTORADDRESS CA 


A31 RECEIVE BUFFER DESCRIPTOR ADDRESS 

jojo} CSE —C—CSC*EF| FJ] SSCACTUALCOUNT Cd 
16 
SOURCE ADDRESS istbyte| 6thbyte = CCC*diO 


6th byte ) 4th byte en | 24 


LENGTHFIELD ses, 128 


OPTIONAL DATA AREA 


Figure 42. Receive Frame Descriptor—Linear Mode 
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where: 
EL — When set, this bit indicates that this RFD is the last one on the RDL. 
S — When set, this bit suspends the RU after receiving the frame. 
SF — This bit selects between the Simplified or the Flexible mode. 
0 — Simplified mode, all the RX data is in the RFD. RBD ADDRESS field is all 
“4g” 
1 — Flexible mode. Data is in the RFD and in a linked list of Receive Buffer De- 
scriptors. | 
C — This bit indicates the completion of frame reception. It is set by the 82596. 
B — This bit indicates that the 82596 is currently receiving this frame, or that the 82596 


is ready to receive the frame. It is initially set to O by the CPU. The 82596 sets it to 
1 when reception set up begins, and to 0 upon completion. The C and B bits are 
set during the same operation. 


OK (bit 13) — Frame received successfully, without errors. RFDs with bit 13 equal to O are possi- 
ble only if the save bad frames, configuration option is selected. Otherwise all 
frames with errors will be discarded, although statistics will be collected on them. 


STATUS — The results of the Receive operation. Defined bits are, 
| Bit 12: Length error if configured to check length 

Bit11: | CRC error in an aligned frame 
Bit 10: Alignment error (CRC error in misaligned frame) 
Bit 9: Ran out of buffer space—no resources 
Bit 8: DMA Overrun failure to acquire the system bus. 
Bit 7: Frame too short. 
Bit.6: No EOP flag (for Bit stuffing only) 


Bit 5: When the SF bit equals zero, and the 82596 is configured to save bad 
frames, this bit signals that the receive frame was truncated. Otherwise it 


is zero. 
Bits 2-4: Zeros | 
Bit 1: When it is zero, the destination address of the received frame matches 


the IA address. When it is a 1, the destination address of the received 
frame did not match the individual address. For example, a multicast 
address or broadcast address will set this bit to a 1. 


Bit 0: Receive collision, a collision is detected during reception. 
LINK ADDRESS —A 16-bit offset (32-bit address.in the Linear mode) to the next Receive Frame 
Descriptor. The Link Address of the last frame can be used to form a cyclical list. 
RBD POINTER — The offset (address in the Linear mode) of the first RBD containing the received 
3 frame data. An RBD pointer of all ones indicates no RBD. 
EOF — These fields are for the Simplified and Flexible memory models. They are exactly 
F ' the same as the respective fields in the Receive Buffer Descriptor. See the next 
SIZE section for detailed explanation of their functions. 
ACT COUNT 
MC — Multicast bit. 
DESTINATION — The contents of the destination address of the receive frame. The field is 0 to 6 
ADDRESS bytes long. | 
SOURCE ADDRESS — The contents of the Source Address field of the received frame. It is 0 to 6 bytes 
long. 


“LENGTH FIELD — The contents of this 2-byte field are user defined. In 802.3 it contains the length of | 
| the data field. It is placed in memory in the same order it is received, i.e., most 
significant byte first, least significant byte second. 
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NOTES 

1. The Destination address, Source address and Length fields are packed, i.e., one field immediately ‘allows 

the next. 

2. The affect of Address/ Length Location mine Source Address Insertion) configuration parameter while re- 

ceiving is as follows: 

— 82586 Mode: The Destination sauces Source address and Length field are not used, they are placed in 
the RX data buffers. 

— 32-Bit Segmented and Linear Modes: when the Simplified memory model is used, the Destination address, 
Source address and Length fields reside in their respective fields in the RFD. When the Flexible memory 
strucrture is used the Destination address, Source address, and Length field locations depend on the SIZE 
field of the RFD. They can be placed in the RFD, in the RX data buffers, or partially i in the RFD and the rest 
in the RX data buffers, depending on the SIZE field value. 


82586 Mode 
ODDWORD | 1615 “EVEN WORD 


NEXT RBD OFFSET AO|EOF| F | ACTUAL COUNT 


| xX X X X|A23 RECEIVE BUFFER ADDRESS 
CeECE EEN AER EIA SIZE , 


32-Bit Segmented Mode 
ODD WORD - 1615. | EVEN WORD 


Linear Mode 
16 15 


NEXT RBD ADDRESS 
RECEIVE BUFFER ADDRESS 


doooc0007000000000 0/E|P SIZE 


Figure 43. Receive Buffer Descriptor 
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ACT COUNT 


NEXT BD ADDRESS 


BUFFER ADDRESS 


EL 


SIZE 
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—- Indicates that this is the last buffer related to the frame. It is cleared by the CPU 
before starting the RU, and i is written by the 82596 at the end of reception of the 
frame. 


— Indicates that this buffer has already been used. The Actual Count has no meaning 


unless the F bit equals one. This bit is cleared by the CPU before starting the RU, 
and is set by the 82596 after the associated buffer has been. This bit has the same 
meaning as the Complete bit in the RFD and CB. 


— This 14-bit quantity indicates the number of meaningful bytes in the buffer. It is 
cleared by the CPU before starting the RU, and is written by the 82596 after the 
associated buffer has already been used. In general, after the buffer is full, the 
Actual Count value equals the size field of the same buffer. For the last buffer of 
the frame, Actual Count can be less than the buffer size. 


— The offset (absolute address in the Linear mode) of the next RBD on the list. It is 
meaningless if EL=1. 


— The starting address of the memory area that contains the received data. In the 
82586 mode, this is a 24-bit address (with pins A24-—A31=0). In the 32-bit Seg- 
mented and Linear modes this is a 32-bit address. 


— |Indicates that the buffer associated with this RBD is last in the FBL. 
— This bit indicates that the 82596 has already prefetched the RBDs and any change 
in the RBD data will be ignored. This bit is valid only in the new 82596 memory 


modes, and if this feature has been enabled during configure command. The 
82596 Prefetches the RBDs in locked cycles; after prefetching the RBD the 82596 


performs a write cycle where the P bit is set to one and the rest of the dataremains uyzmmaes 
unchanged. The CPU is responsible for resetting it in all RBDs. The 82596 will not Am 


check this bit before setting it. 


— This 14-bit quantity indicates the size, in bytes, of the associated buffer. This quan. = 


tity must be an even mya e ts 
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PGA PACKAGE THERMAL SPECIFICATION 


ELECTRICAL AND TIMING | 
CHARACTERISTICS 


_ Absolute Maximum Ratings | 
e Storage Temperature........ — 65°C to + 150°C 


e Case Temperature under Bias — 65°C to + 110°C 


e Supply Voltage’ | 
' with Respect to Vss......... —0.5V to +6.5V 


e Voltage on Other Pins ....—0.5V to Vcc + 0.5V 


DC Characteristics _ 


Tc = 0°C-85°C, Vog = 5V +10% LE/BE have MOS levels (see Vit» Vain): 
All other signals have TTL levels (see Vi_, Vin; VoL, YOH).. bats 


ae 


lo. = 4.0 mA 


. . 


; [+ : 


low = 0.9 mA-1 mA 


0.45 < Vout < Vcc 


FC = 
FC = 1 MHz 
FC = 1 MHz 
At 25 MHz 
loc Typical = 100 mA 
m At 33 MHz 
Icc Typical = 150 mA 


pA 


pA 
pF 
pF 
pF 
mA 

A 
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AC Characteristics 


82596CA INPUT/OUTPUT SYSTEM TIMINGS 


To = 0°C-85°C, Voc = 5V £10%. These timing assume the C, on all outputs is 50 pF unless otherwise 
specified. C_ can be 20 pF to 120 pF however timings must be derated. All timing requirements are given in 
nanoseconds. » 


Operating Frequency 12.5 MHz 
CLK Period | 


25 MHz 1X CLK Input 


T1la 0.1% 


CLK Period Stability Adjacent CLK A 
0.8V 
0.8V to 2.0V 


2.0V to 0.8V 


CLK Low 
CLK Rise Time 
CLK Fall Time 


BEn, LOCK, and A2-A31 Valid Delay 
BLAST, PCHK Valid Delay 
BEn, LOCK, BLAST, A2—A31 Float Delay 


NO 


T1 
T2 
T3 
T4 
T5 
T6 
Téa 
T7 
T8 
T9 


T12 ~ HOLD Valid Delay fee 
17 


40 
14 
14 
3 
3 


W/R and ADS Valid Delay 


N 


- - 
G) { @®]M 


= 
i MP} PM _M ] PTD 
NO 


NO 


26 


DO-—D31 CPU PORT Access Setup Time 
T 3 


— 
G 
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T18 3 

5 
T20 3 
T21 10 
T22 3 

3 
T23 10 
T24 3 
T25 ——- 
T26 271 
T27 a: 
T28 3 
T29 7 
T30 3 
T31 10 
T32 3 
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AC Characteristics (Continued) 


82596CA INPUT/OUTPUT SYSTEM TIMINGS | | amy, 2 

To = 0°C-85°C, Voc = 5V +5%. These timing assume the C, on all outputs is 50 pF unless otherwise 
specified. C_ can be 20 pF to 120 pF, however timings must be derated. All timing requirements are given in 
nanoseconds. | | 


| Srmta ames eee _ ae 


T2. | CLKHigh 


. 0.8V to 2.0V_ 


Tae: 
T6 BEn, LOCK, and A2-A31 ValidDelay ~ 


pm 
ie 
Mica 
BLAST, PCHK Valid Delay 
Sica 
a eae 
Bund 


CLK Rise Time | a | - ee 
CLK Fali Time EN 


2.0V to 0.8V 


20 


Az BEn, LOCK, BLAST, A2-A31 FloatDelay _ 
18. oe 


W/R and ADS Valid Delay 
_.W/R and ADS Float Delay. __ 


10 DO-D31, DPn Write Data Valid Delay ~ CS 


TH. 
| T12 HOLD Valid Delay 
113 CA and BREQ Setup Time 


CA and BREQ Hold Time 
: 


19 
19 
50: 

19. 


1,2 


115 


BS16 Setup Time 
T16 ~~ | BST6Hold Time 


[117 | BRDV,RDYseupTme 
[tao [_D0-081, DPn READ Hold Time 


1,2. 


be) 
N 


AHOLD Hold Time 
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AC Characteristics (Continued) 
82596CA INPUT/OUTPUT SYSTEM TIMINGS 


C, on all outputs is 50 pF unless otherwise specified. 
All timing requirements are given in nanoseconds. 


Parameter 


HLDA Hold Time 


DO-D31 CPU PORT Access Setup Time ___.. 


DO-—D31 CPU PORT Access Hold Time 


NOTES: . = aise 
1. RESET, HLDA, and CA are internally synchronized. This timing is to guarantee recognition at next clock for RESET, HLDA 
and CA. | - A are | . 
2. All set-up, hold and delay timings are at maximum frequency specification Fmax, and must be derated according to the 
following equation for operation at lower frequencies: no 
Tderated = (Fmax/Fopr) x T 
where: . 
Tderate = Specifies the value to derate the specification. 
Fmax = Maximum operating frequency. 
Fopr = Actual operating frequency. 
T = Specification at maximum frequency. . | . | 
This calculation only provides a rough estimate for derating the frequency. For more detailed information, contact your 
Intel Sales Office for the data sheet supplement. . 
3. CA pulse width need only be 1 T1 wide if the set up and hold times are met; BREQ must meet setup and hold times and 
need only be 1 T1 wide. : 


TRANSMIT/RECEIVE CLOCK PARAMETERS 


RO : 

) 
: 

= 

N 


TxC Fall Time. 


T41 TxC Low Time 
T42 | _ .TxD Rise Time 


TxD Fall Time | : 


wth 


ob 
© | © 


20 
i 
T48 


i) 
o1} O1 


N NM | Nh —)}— 
- 


TxC High to TxD Transition 
| T48 «| TxG Low to TxD High (At End of Transition) 
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TRANSMIT/RECEIVE CLOCK PARAMETERS (Continued) 


RTS AND CTS PARAMETERS 


TxC Low to RTS Low, 7 
Time to Activate RTS . | 
ar TxC Low to CTS Invalid, CTSHoldTime | 10 ~ Sea 
T52 | 


_TxC Low to RTS High. 
RECEIVE CLOCK PARAMETERS 

RXC Cycle 

RXC Rise Time 

RXC Fall Time — 

RXC HighTime 

Clowime fie 


- CDT Low to Jam Start 


CRS Low to TXC High, 

Carrier Sense Setup Time ae 
TXC High to CRS Inactive, CRS Hold Time — 
(Internal Collision Detect) 


CRS High to Jamming Start, 
Jamming Period 


| CRS High to RXC High, 
CRS Inactive Setup Time 


RXC High to CRS High, 
CRS Inactive Hold Time | 
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TRANSMIT/RECEIVE CLOCK PARAMETERS sc eaernas yea 


INTERFRAME SPACING PARAMETERS 


Interframe Delay eee ROA Ae Wo 


EXTERNAL LOOPBACK-PIN PARAMETERS 


NOTES: 
1. Special MOS levels. Ven = = 0.9V and Voip = 3.0V. 
Manchester only. 
Manchester. Needs 50% duty cycle. 
1 TTL load + 50 pF. 
1 TTL load + 100 pF. 
NRZ only. 
Abnormal end of transmission—CTS expires before RTS. 
Normal end to transmission. 
Programmable value: 
T71 = Nigs 2 T36 
where: Ni¢s = the IFS configuration value 
(if NiFs is less than 12 then Nigs is forced to 12). 
10. Programmable value: 
T64 = (Ncpr ® T36) + x° T36 
(If the collision occurs after the preamble) 
where: . 
Ncopr = the collision detect filter configuration value, 
and 
X = 12, 13, 14, or 15 
11. T68 = 32° T36 
12. Programmable value: 
T67 = (NcsF ® T36) + x? T36 
where: Ncsf = the Carrier Sense Filter configuration 
value, and 
X = 12, 13, 14, or 15 
13. To guarantee recognition on the next clock. 


OO ee 
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82596CA BUS OPERATION | 
The following figures show the 82596CA basic bus cycle and basic burst cycle. 


Please refer to the 32-Bit LAN Component User’s Manual. 


XXEXNNKK\ « /AKRARKERRR AURORA ACK AIO 


290218-41 


Figure 45. Basic 82596CA Burst Cycle 
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SYSTEM INTERFACE A.C. TIMING CHARACTERISTICS 


The measurements should be done at: 

® To = 0°C-85°C, Veco = 5V £10%, C = 50 pF unless otherwise specified. 

© A.C. testing inputs are driven at 2.4V for a logic “1” and 0.45V for a logic “‘O”’. 
° Timing measurements are made at 1.5V for both logic “1” and “0”. 


© Rise and Fall time of inputs and outputs signals are measured between 0.8V and 2.0V respectively unless 
otherwise specified. 


° All timings are relative to CLK crossing the 1.5V level. 
° All A.C. parameters are valid only after 100 ws from power up. 


2.4V 
H 1.5V Test Point 
0.45V —: 


290218-18 


290218-19 


Figure 46. CLK Timings 


Two types of timing specifications are presented below: 
1. Input Timing—minimum setup and hold times. 
2. Output Timings—output delays and float times from CLK rising edge. 


Figure 47 defines how the measurements should be done: | 


LEGEND: | 290218-20 
Ts = Input Setup Time | 

Th = Input Hold Time | 

Tn = Minimum output delay or Mininum float delay 

Tx = Maximum output delay or Maximum float delay 


Figure 47. Drive Levels and Measurements Points for A.C. Specifications | 


Ts = 713, T15, 717, T19, T21, 123, T27, T29, T31 

Th = 114, 716, 118, T20, T22, T22a, 124, T28, T30, T32 
Tn = 16, T6a, T7, T8, T9, T10, T11, T12, T25 

Tx = T6, Téa, T7, T8, T9, T10, T11, 112, T25 
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__ INPUT WAVEFORMS 


290218-21 


Figure 48. CA and BREQ Input Timing © 


INT/INT x : | X—1 5V 


Figure 49. INT/INT Output Timing 


290218-22 


290218-23 


DP3=DPO 


BS16 
290218-24 


Figure 51. Input Setup and Hold Time - 


4-122 


intel. 82596CA PRELIMINARY 


A31—A2, BEn, plas 


LOCK (T6) VALID n KKK VALID n+4 
PCHK, BLAST (T6a) 


MAX 
W/R, ADS VALID n OOK VALID n+ 


ha 0 
MIN MAX | 
DP3=DPO : 
D31-DO : OOK VALID DATA 


(OUTPUT) 
| 290218-25 


ies 18 


Figure 52. Output Valid Delay Timing 


A31-A2, BEn : 
LOCK, BLAST VALID n 


~D31-D0 


(OUTPUT) aan | 
: ae 290218-26 


| 290218-27 


Figure 54. PORT Setup and Hold Time 
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290218~28 


Figure 55. RESET Input Timing 


SERIAL AC TIMING CHARACTERISTICS 


290218-29 


(NRZ) eee ; 
T44 


TXD eee 9®e Pea OC 


(MANCHESTER) 20 eee ewes 


290218-30 


Figure 57. Transmit Data Waveforms — 
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CRS 


TXD = @ @e ewoee eq ge @oee @ 


(nRz) a a ees 
pa | 


TXD 
(MANCHESTER) 
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—>| |<+-T60,761 
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Figure 59. Receive Data Waveforms (NRZ) 
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Figure 60. Receive Data Waveforms (CRS) 
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OUTLINE DIAGRAMS 


132 LEAD CERAMIC PIN GRID ARRAY PACKAGE INTEL TYPE A 


SEATING 


PLANE 
D 
A 
a SEATING 
PLANE 
e, @B (ALL PINS) 
SWAGGED 
DETAIL 
45° CHAMFER 45° CHAMFER mm (inch) 
(INDEX CORNER) (SPL Wats 
. 290218-34 


- Solid Lid 


IWS 10/12/88 | 
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82596CA 


Intel Case Outline Drawings 


Plastic Quad Flat Pack (PQFP) 


0.025 Inch (0.635mm) Pitch 


Description 


Symbol 


Leadcount 


oO 


oO 


Package Height 0.16 
Standoff 


Oo 


Ww) 


SP) 


Package Body 0.54 


Nm 


D4,E4 |Foot Radius Location) 0.623/0.637]0.723)0.737)| 0.823) 0.837) 1.023) 1.037) 1.223) 1.237] 1.423] 1.43 


D3, E3 {Lead Dimension 


Foot Length 0.020/0.030/0.020]0.030/ 0.020] 0.030] 0.020/0.030]0.020/0.030/0.020]0.03 


Symbol 


IWS Preliminary 12/12/88 


Description 


Leadcount 


1 


0.5 


Package Height 4.06 
Standoff 


AN 


NK 


9 


Package Body 13.8 
E2 |Bumper Distance 17.7 


© 


0 


Foot Length 
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: 


/N\-H-| ;>—-BASE PLANE 
be Al 


i ED-/3\ 
PN APDOARRRDDRDHADARORODO ADT 
a eS fe 


TO 


hpe WET EPIL ae 
DOOURB EY Rona 


=C-ISEATING PLANE 


NC BURG 


mm (inch) 
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DO 


DOA — 


3.81 (.158) MAX TYP _ 


SEE DETAIL M 
‘a : 


S77 ’ ; 
1.91 ¢€.875) MAX TYP 


| 
| L| 682 MM/MM “CIN/IN) [D_ 


DOA 
mm (inch) LL | 282 MM/MM_CINZIN) [DI 


290218-36 


Figure 62. Molded Details 
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SEE DETAIL L 


SEE DETAIL J 


290218-37 


[@ [0.13 ¢.095) @ |C|A@-BE® |D© VA 


G.41 (.916) 
B8.28 (.988) 


@.31 (.812) 4k 
G@.29 (.888) 


|} [2.20 (908) |C|A@-BO |D® VA 


mm (inch) 290218-38 | 


Detail J Detail L 


Figure 64. Typical Lead 
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1.32 (.852) 
1.22 (.848) 


8.98 (.835) MIN. 


1.32 (.852) 


1.22 (.048) 
8.98 (.835) nm a 2.83 (.888) 


1.93 (.876) 
2.83 (. 888) 
1.93 (.876) 


290218-39 


mm (inch) 


Figure 65. Detail M 
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COMPREHENSIVE SOFTWARE DEBUG SUPPORT FOR i960T™ 


EMBEDDED APPLICATIONS 


Intel provides comprehensive software debug support for all members of the i960T™ 
component architecture, including the newest members, the i960SA and i960SB. All 
Intel’s i960 software debug products share the same high-level, windowed user interface 
emerging as the standard for all i960 tools from Intel. This innovative debug interface 
allows users to focus their efforts on finding bugs rather than spending time learning and 


manipulating the debug environment. 


Intel’s i960 software debug tools support a wide variety of debug environments, including 
code debug on a simulated target environment, a PC-based evaluation board, a serial- 
based Intel evaluation board, or a serial-based, customized target system. 


GENERAL i960 SOFTWARE DEBUGGER FEATURES 


° Windowed, pull down menu user 
interface shared by other i960 
Development Tools 

° Full symbolic debug with source level 
display allows C or assembly code 
debugging 

° Debugging productivity enhanced by 
ability to quickly browse source code and 
view call stacks or symbol run-time 
values 


° Breakpoints may be defined symbolically 
using module names, procedure names 
and line numbers 

° Single step execution, code assembly/ 
disassembly, memory and register 
display/modification 

° Run-time library support allows: 


programs to access host files and perform 
I/O 


*IBM, PC/AT, and Personal System/2 are registered trademarks of International Business Machines Corporation. 


"Compaq i isa registered trademark of the Compaq Corporation. 


*Intel is a registered trademark of the Intel Corporation. 


November 1991 
Order Number: 280916-002 
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| FEATURES | | Ss 


EASY TO USE, PO a he 
USER INTERFACE 


All i960 debuggers share the same high-level, 
powerful user interface as other i960 
development tools. Utilizing pulldown menus, 
users have access to a color, windowed 
environment featuring source-level, symbolic 
debugging. Multiple, non-overlapping windows 
can be used to display source code, registers, 
variable values, and command line entries. 


DEBUGGING FEATURES 


High-level source or disassembled code can be 
displayed in the source window. Users can 
scroll through the source, browse from module 
to module in a program, scope to any 
executable point in the source, or 
instantaneously relocate from a symbol name 
to the location where it was defined : 
(hyperscope operation). Symbol names in the 
source can be highlighted to inspect the 
current run-time value of program variables. 


Call stacks can be examined to trace execution ; 


flow. 


A variety of bionkeoints can be specified 
including source breakpoints, watch points, | 
passpoints, or event-action: breakpoints. 
Breakpoints can be defined symbolically using 
module names, procedure names and line 
numbers. Watch points allow users to observe 
a variable as it changes during program 
execution. Passpoints display a message when 
a specified instruction is executed, giving the 
user a non-realtime way to track execution of 
key code sequences without halting instruction 
flow. The event-action form allows complex 
breakpoint conditions to be set up, including 
data breakpoints (when supported by on-chip 
registers). — 


Users can step through program execution via 
a single assembly language instruction, a high- 
level language statement or a high-level | 
function or procedure. Memory can be 
displayed or modified’ as common data types 
and all processor registers and system tables 
can be examined or changed. 


Expressions involving symbol names, memory 
references, or both, can be defined as watch 
expressions whose values are monitored in a 
Watch window as a program executes. The 
i960 family of software debuggers also allows 
screen flipping between the debugger | 


environment and the display output from the _ 


program. 


Low level, run time libraries are provided that 
allow programs running on an i960 board to 
access the file system on the host or to perform 
I/O operations. 


RETARGETABLE SOFTWARE 


DEBUGGER 


Intel’s DB-960 Retargetable Software 
Debugger is a combination application and 
system level debugger designed for use with 
the i960 family of embedded microprocessors. 
DB-960’s retargetable monitor can be 
customized to a target system, allowing source- 
level, symbolic debug across a serial interface 
cable. 


RETARGETABLE MONITOR 


Utilizing a combination of object files and 


- source code, a retargetable monitor is provided 


with DB-960 for users to customize and 
incorporate into their proprietary target 
systems. This retargetable monitor is designed 


to support all members of the 1960 family. Most 
_ of the monitor code is provided in object code 
_ and does not need to be changed. Hardware- 


dependent source code is supplied for . 
modification by users. Example code is 


_ provided for porting the monitor to the Intel 


EV80960CA and QT960 target boards. Both 
boards use.an Intel 82510 UART serial 
controller chip and the Intel 82C54 Counter/ 
Timer. 


- HARDWARE DEBUG 


DB-960 takes advantage of on-chip debug 
registers like those found on the i960CA to 
provide two hardware execution address 
breakpoints and two data address breakpoints. 
Once the monitor has been retargeted to the 
target system, hardware designers can 


download initialization code, read/write to 


registers and examine memory or register 
contents. 


HIGH SPEED SERIAL LINK 


DB-960 communications between the host and 
target system is supported via RS232 and 
RS422 communication links. RS232 allows 
access to industry standard serial protocols 
while the RS422 interface provides higher 
speed communication (up to 115K baud) for — 
faster code'and data download. PC-AT bus- 
compatible RS422 communication boards are 
available from various third party vendors. 
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CUSTOMIZED ENVIRONMENT 


Because the user has control over the target 
board and serial driver source code, a highly 
customized target environment can be 
developed. Serial communication functions can 
be modified to allow for parallel 
communication schemes, allowing faster 
download speeds. 


LICENSING 


There are no incorporation or royalty fees for 
customers shipping the retargeted DB-960 
monitor with their product or system. 


PC-BASED SOFTWARE 
DEBUGGER 


The DB960KBDEVA Software Debugger is 
designed for debugging i960KA or i960KB code 
executing on an Intel EVA-960KB4MB 
Software Execution Vehicle plugged into PC- 
ATs or compatibles using DOS. 
DB960KBDEVA offers the same powerful 
debug user interface as other i960 softerware 
debuggers and utilizes I/O resources provided 
by the PC. Due to compatibility with the 
i960KA and i960KB, i960SA and i960SB code 
can be executed and debugged using the Intel 
EVA-960KB4MB Software Execution Vehicle 
in conjunction with the DB960KBDEVA 
Software Debugger. 


SIMULATOR-BASED 
SOFTWARE DEBUGGER 


The DBSIM960 Debug Simulator combines an 
~ 1960 CA/KA/SA instruction-level simulator 
with the easy to use, powerful DB960 software 
debugger interface. Users can debug i960 
applications without a hardware target system 
being available, allowing products to get to 
market sooner. For i960 CA designs, 
performance information is provided, with 
timing profiles accurate to plus or minus 5%. 


Users can specify the target system’s clock 
speed and wait-state information for each 
region of memory.* DBSIM960 uses this 
information to provide i960 CA performance 
statistics. DBSIM960 expects COFF executable 
files generated by Intel’s CTOOLS960 compiler 
and assembler. Execution flow can be 
monitored by using a trace capability, which 
reports the 8 digit cycle address, 8 digit 
instruction pointer value, and the 
disassembled instruction for each operation. 


Program execution statistics reported 
include: 

° Total number of instructions executed 

° Total time 

° Number of times a call caused processor to 
write registers to external memory 

Current clock setting in cycles per second 
Current wait-state setting for each of the 16 
memory regions 

Number of instruction words executed from 
cache rather than external memory | 
Total number of cycles elapsed 

Number of stack frames or register sets 
cached on chip 

Number of times an unaligned load or store 
operation occurred 

Bus utilization 

Branch prediction efficiency 

Usage for load, store, call and branch cache 
instructions 


Generally, DBSIM960 provides all the full 
symbolic, debug capabilities found in the 1960 
family of debug tools, while providing a 
complete benchmarking environment prior to 
target system availability. | 

*By being able to easily change the waitstate definition for 


their code, the user’s hardware and software design canbe - 
optimized before any hardware development takes place. 


IN-CIRCUIT DEBUG MONITOR 


Intel’s DB960CADIC in-circuit debug monitor 
hosted on extended DOS/386 allows users to 
debug high-speed, cached applications at the 
full speed of the i960CA target processor. 
DB960CADIC can be used by both hardware 
and software developers, at any stage of design. 
Early in the development process, 
DB96OCADIC allows software debugging when 
inserted into an existing i960CA board such as 
the EV80960CA, or in the DB960CASAST 
stand-alone self-test unit. Later in the design 
cycle, DB960CADIC can be inserted into the 
user’s target system, facilitating debug of 
hardware/software integration. 


DB960CADIC offers the same, windowed debug 
user interface as other 1960 software debuggers 
and is also available with an optional 4 MB 
standalone self test chassis to debug and test 
code before prototype hardware is available. 
For further information, see fact sheet 
#280900 from Intel. . 
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| FEATURES | | 


SOFTWARE COMPLETES THE 
SYSTEM 


Intel provides a comprehensive software 
development environment to complement DB- 
960. This environment includes a C Compiler, 
an i960 Assembler, a system generator for 
automating the compilation process and 
instruction-level simulators. The languages 
support the entire range of i960 aaa 3 
processors. 


WORLDWIDE SER VI CE, 
SUPPORT, AND TRAINING 


To augment its development tools, Intel offers 
a full array of seminars, classes, workshops, | 
field application engineering expertise, hotline 
technical support, and on-site service. 


HOS T SYSTEM REQ UIREMEN VTS 


Host system requirements to run Intel’s i960 

family of software debuggers include the. 

following: 

¢ DOS version 3.3 or later excluding DOS 4.0 

e 640K bytes of RAM in conventional memory 

e A fixed disk drive with at least 1.25M bytes 
of free disk space 

¢ One disk drive capable of reading 5.25 inch, 
360K byte disks 

¢ RS232 serial port (COM1 or COM2) 


SPECIFICATIONS AND REQUIREMENTS 


Intel also offers a Software Support Contract: 
which includes technical software information, 
automatic distributions of software and _ 
documentation updates, iCOMMENTS 
publication, remote diagnostic software, anda 
development tools troubleshooting guide. 


Intel’s 90-day Hardware Support package 
includes technical hardware information, 
telephone support, warranty on parts, labor, 
material, and on-site hardware support. 


Intel Development Tools also offers a 30-day, 
money-back guarantee to customers who are 
not satisfied after purchasing any Intel 
development tool. 


Evaluated Systems include: _ 


IBM PC-AT* with DOS 3.3: 

COMPAQ 386* with DOS 3.3 

Intel 3074,02* with DOS 3.3 . 

IBM Personal a 2* Model 70/80 with 
DOS 4.01 
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ORDERING INFORMATION 


DB960KBDEV 


DBI60KBDEVA 


DBSIM960D 


DBSIM960S 


DOS-based, retargetable 
software debugger for the 
960KA, i960KB, i960SA, 
i960SB and i960CA 
embedded microprocessors. 
Includes host debug 
software, retargetable 
monitor, host I/O libraries 
and documentation. 


DOS-based source level 
debugger for the i960KA, 
i960KB, i960SA and i960SB 
embedded microprocessors. 
Requires EVA-960KB4MB 
Software Execution Vehicle 
and PC-AT compatible bus. 


DOS/386-hosted debug 
simulator for the i960 CA, 
i960 KA and 1960 SA which 
utilizes an i960 CA 
instruction-level simulator 
allowing code development 
and debug prior to hardware 
prototype availability. 


UNIX System V/386-hosted 
debug simulator for the i960 
CA, i960 KA and i960 SA 
which utilizes an i960 CA 
instruction-level simulator 
allowing code development 
and debug prior to hardware 
prototype availability. — 


DBSIM960R 


DB9I6NCADIC 


DB9I60CASAST 


IBM RS/6000-hosted debug 
simulator for the i960 CA, 
i960 KA and 1960 SA which 
utilizes ani960CA | 
instruction-level simulator 
allowing code development 
and debug prior to hardware 
prototype availability. 


DOS/386 hosted in-circuit 
debug monitor for i960CA 
only. Includes small board 
with 1960CA processor, 
system debug monitor and 
serial interface. Plugs into 
i960CA socket on hardware 
prototype system. 


Standalone Self Test Unit for 
DB960CADIC. Includes built- 
in power supply, self-test 
board, 4M byte of usable 
DRAM for code development 


and enclosure. 


_ To order your Intel Development Tool product, 
for more information, or for the number of 
your nearest sales office or distributor, call 
800-874-6835 (North America). For literature 
on other Intel products call 800-548-4725 
(North America). Outside of North America, 
please contact your local Intel sales office or 
distributor for more information. 


EXV-960MC EXECUTION VEHICLE 1 eA 


280879-1 


80960MC-BASED TARGET SYSTEM SUPPORTING EARLY 
SOFTWARE DEVELOPMENT AND BENCHMARKING 
EXV-960MC is a software execution vehicle designed to support 80960MC-based designs. 
Users can use the EX V-960MC board to execute and debug their application software 
before a functional hardware prototype is available. The EX V-960MC is also designed | 
with programmable waitstate SRAM to support benchmarking activities. The EXV- 
960MC is supported by the complete set of Intel C, assembler and Ada code generation 
tools. Both of the VAX/VMS*-hosted 80960MC software debuggers, the SDM-960MC 
system debug monitor and the Ada-960MC source-level debugger, can be used for 
debugging software running on the EXV-960MC. | 


EXV-960MC includes a Multibus I form factor board and a set of SDM-960MC target 
monitor EPROMS. The SDM-960MC and the Ada-960MC debugger are preconfigured to 
support the EXV-960MC execution environment. Designers can select the software 
debugger best suited to their development needs. The Ada-960MC debugger is a source- 
level symbolic debugger which provides.a productive debugging environment for Ada 
applications. The SDM-960MC debug monitor offers a complete debugging facility for 
applications written in C, assembler or Ada. | 


*VAX/VMS is a trademark of Digital Equipment Corp. 


December 1990 
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SDM-960MC RETARGETABLE SYSTEM DEBUG 
MONITOR 


FEATURES — 

© 25 MHz 80960MC processor 

°o 256 Kbytes of (0,0,0,0) programmable wait-state SRAM 
© 4 Mbytes dual-ported (3,1,1,1) wait-state DRAM 

° iSBXT interface 

° Two serial ports, one bi-directional parallel port 

0° 8254 programmable interval timer 

© 8259A programmable interrupt controller 


ELECTRICAL CHARACTERISTICS 
10A @ +5V | 

50OmA @ +12V 

50mA @ —-12V 


ENVIRONMENTAL CHARACTERISTICS 
Operating temperature: 0° to + 60°C (82° to 140°F), 300 LFM 
Operating Humidity: 10% to 90% non-condensing 


SOFTWARE DEBUGGING SUPPORT 


The SDM-960MC is a VAX/ VMS*-hosted system debug monitor that provides a complete, flexible 
environment to execute and debug 80960MC-based applications. Users can tailor the execution 
environment as software development evolves. Initially, the application may require the full 
support of the system debug monitor to establish a run-time environment. As the application 
evolves, the SDM-960MC allows the application to take more of the responsibility for system 
functions. 


The default execution environment of the SDM-960MC is the EX V-960MC execution vehicle. The 
VAX-hosted portion of the SDM-960MC debug monitor provides complete on-target debugging 
support through its interface with the target-resident portion of the SDM-960MC. To facilitate 
debugging on a user’s custom target system, the SDM-960MC includes source and object files 
necessary to reconfigure the target monitor. SDM-960MC and other 80960MC development tools 
allow the developers to take full advantage of the 80960MC processor. 


FEATURES WORLDWIDE SERVICE AND 
° assemble and disassemble 80960MC SUPPORT | 
instructions . _ Intel augments its 80960 architecture family 
° single step program execution development tools with a full array of 
° access to memory and processor resources seminars, classes, and workshops; on-site 
°® support 64 execution breakpoints consulting services and telephone support are 
® issue Interagent Communications ([ACs) available at all stages of development. 
° powerful execution trace 
° serial download . | ORDERING INFORMA TI ON 


HARDWARE R EQUIREMENTS Product Code Description 
EXV960MC  80960MC execution vehicle 


: eee a oe (board and target EPROM) 
° contiguous 50 Kbytes of RAM | SDM960MC VAX, MicroVAX/VMS 
7 hosted System Debug 


Monitor, retargetable source 
is included 
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280906-1 


COMPREHENSIVE DE VELOPMEN T SUPPORT FOR 80960SA 
SB EMBEDDED APPLICATIONS | 


Intel provides comprehensive development support for the 80960 component 
architecture, including the newest members, the 80960SA and 80960SB. Tools range from 
compilers to simulators and from debuggers to emulators. All designed specifically for 
members of the 80960 family, allowing you to take full advantage of their RISC-based 
design while reducing time to market. 


DEVELOPMENT TOOLS AVAILABLE: 


e ASM-960 macro assembler for __ © Windowed, interactive, source-level DB- 
developing and tuning speed-critical code 960 debugger which can be targeted to. 

e iC-960 highly optimizing Clanguage one of the evaluation and development 
compiler for high-level language - boards below, or customized to your 
software development | target system a: 

° GEN-960 system generator for . e Evaluation and development board 
initializing your design to take : including the EV960SB, the QT80960KB, 
advantage of 80960 on-chip features and the EVA960KB 

e DB/SIM960KA debug simulator for e ICE-960SA/SB offers a full featured in- 
80960KA and 80960SA applications | circuit emulator for the 80960SA/SB 

| components 
November 1990 
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80960SA/SB DEVELOPMENT SUPPORT 
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ASM-960 MACRO ASSEMBLER 


The ASM-960 macro assembler is used to fine- 
tune sections of code for peak program 
execution speed on the 80960SA, 80960SB, 
80960KA, 80960KB, 80960MC, and 80960CA. 
ASM-960 does this by giving you absolute 
control over program instructions. In addition 
to the assembler and macro preprocessor, 
ASM-960 includes several utilities for 
application program maintenance and debug: 
° LINKER provides incremental program 
linking/locating and link-time optimization. 

e ARCHIVER allows you to build reusable 
function libraries for applications. 

e DISASSEMBLER produces assembly 
language from object files. 

° SYMBOL DUMPER provides symbolic 
information from a program file for 
facilitating low-level debug. 

¢ ROM IMAGE BUILDER produces a hex file 
suitable for PROM programmers. 

e Macro preprocessor provides code generation 
flexibility and improves code readability, 

- reducing maintenance costs. 


A Floating Point Arithmetic Library (FPAL) is 
included for the 80960SA, 80960KA, and 
80960CA components. It eliminates the need to 
develop your own floating point code. 


oA re 


Link Modules 
Bete us 
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GEN-960 SYSTEM GENERATOR 
The 80960 System Generator (GEN-960) helps 


_you set up data structures for standalone, 


embedded applications that use the on-chip 
features of the 80960 architecture. GEN-960 is 
used with other 80960 tools to generate and 
refine ROM or RAM code. GEN-960 supplies a 
set of command and template files containing 
assembly code and linker control commands to 
set up processor control blocks, inter-agent 
communication mechanisms, system procedure 
tables, and other requirements for 
initialization. The result is a batch file 
containing all the commands needed to 
compile, assemble and link the final target 
system. 
° Improves engineering productivity by 
automating the compilation, assembly and 
linking process 
° Supplies sample initialization code, reducing 
programming time 
e Save engineering time by simplifying the 
task of initializing each processor for on-chip 
capabilities 
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iC-960 COMPILER 


iC-960 is a highly optimizing C language 

~ compiler for the 80960 family of 
microprocessors. iC-960 supports the full C | 
language as described in the Kernighan and 

‘Ritchie book, The C Programming Language 
(Prentice-Hall, 1978). iC-960 includes standard 
ANSI extensions to the C language and is used 
in conjunction with ASM-960 for creating | 
object files. 


The iC-960 compiler supports a number of 
processor dependent optimizations including . 
global register allocation, constant 
propagation, arithmetic identity folding, 
redundant load/store elimination, strength 
reduction and register allocation/scheduling of 
arguments. Processor independent 


optimizations include common sub-expression | 


elimination, folding of constant expressions, 


‘elimination of superfluous branches, removing - 


unreachable code, tail recursion and procedure 
incorporation. 


iC-960 includes a standard C etary watt U/ O- 
functions and mathematical routines. A second 
library provides low level; environment- — 
dependent routines emulating UNIX* system — 
calls and supplies I/O routines for the EVA- 
960 Software Execution Vehicle. | 


iC-960 also includes the following 
enhancements for embedded application 
development: 


Programs may be easily placed in ROM. 
Memory-mapped I/O allows high-level 
language access to application- epee input 
and output. 

In-line assembly simplifies the integration of 
C language and assembly code for speed- 
critical functions. 


Floating point support.produces in-line code 
to take full advantage of the floating point 
capability of the 80960SB, 80960KB and 
80960MC. 


Symbolic debugging of source seats for 1C-960 
and ASM-960 is provided by the DB-960 Source 
Level Debugger, the DBSIM960KA debugging 
simulator, the DB960CADIC in-target 
debugger, and the ICE960SB and ICE960KB 
emulators. 


DEBUGGING SIMULATOR 


The DBSIM960KA simulator features an easy 
to use, pulldown menu user interface combined 
with an 80960SA/80960KA instruction 
simulator. DBSIM960KA facilitates debugging 
80960SA and 80960KA applications by 
providing debugging capabilities before target 
hardware is available. DBSIM960KA’s 
powerful, windowed, source-oriented interface . 
allows you to focus your efforts on finding bugs 
rather than on learning and manipulating the. 
debug environment. 


Ease of learning. Drop-down menus make the 


debugger easy to learn for new or casual users. 


A command line interface allows direct 
command entry for solving more complex 
problems, improving productivity of - 
knowledgeable users. 


Extensive debug modes. You can set 
conditional breakpoints, pass points, and . 
temporary breakpoints as needed. 


See into your program. Using pull-down — 
menus or function keys, you can browse source 
and Call stacks, monitor processor registers, 


_ view screen output, and watch the values of 
_ variables change. 


Full debug symbolics for1 maximum . 
productivity. You need not know chether a. 
variable is an unsigned integer, a real, or a 
structure: the debugger displays program. 
variables in their respective type formats. . 
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80960SA/SB DEVELOPMENT SUPPORT 


EVA-960KB4MB SOFTWARE 
EXECUTION VEHICLE 


The EVA-960KB4MB is a software execution 

vehicle for the 80960KA/KB microprocessor. It 

is asingle PC AT plug-in board which provides 

easy and convenient architecture evaluation 

and benchmarking, as well as software 

development. Since the board uses an 

80960KB, 80960SA and 80960SB performance 

can be extrapolated. The EVA-960KB4MB 

contains the following: 

e 4 MB or 16 MB (EVA9S60KB1I6MB) of one 
wait-state program memory (DRAM) 

° 64 Kbytes of zero wait-state program 
memory (SRAM) 

° Three-channel programmable interval timer 


SOURCE-LEVEL DEBUGGER 


The DB-960 Debugger with source-level debug 
capabilities is available for PC ATs equipped 
with DOS. DB-960 can debug 80960 code 
executing on an Intel EVA-960 Software 
Execution Vehicle or on a hardware target 
system via a serial interface. The EVA-960 | 
targeted debugger uses I/O resources provided 
_ by the PC, while 80960 code executes at high 
speed on the EVA-960. Two serial versions of 
DB-960 are available. DB-960CADIC plugs 
directly into the 80960CA socket on your 
prototype, offering a “plug-in and go” debug 
environment. DB-960D is a serial, retargetable 
version of DB-960 whose system debug monitor 
can be customized for 80960SA/SB, 80960KA/ 
KB, or 80960CA operation. 


Ease of learning. Drop-down menus make the 
debugger easy to learn for new or casual users. 
A command line interface allows direct 
command entry for solving more complex 
problems, improving productivity of 
knowledgeable users. 


° Hosted debug monitor which supports two 
hardware and 64 software breakpoints, 
single-step program execution, register and 
memory access, program download and 
upload 

° DOS access libraries that allow: screen 
display, keyboard input, read and write disk 
files, and the ability to spawn a DOS process 
that could communicate with serial or 
parallel I/O 

e 20 MHz operation, allowing software to 
operate at full speed of 80960KB 


EVA-960KB4MB also operates with the DB- 
960 Source Level Debugger for code 
development/debug prior to target system 
availability. | 


Extensive debug modes. You can set 
conditional breakpoints, pass points, and 
temporary breakpoints as needed. 


See into your program. Using pull-down 
menus or function keys, you can browse source 
and Call stacks, monitor processor registers, 
view screen output, and watch the values of 
variables change. | 


Full debug symbolics for maximum | 
productivity. You need not know whether a 
variable is an unsigned integer, areal, ora | 
structure: the debugger displays program 
variables in their respective type formats. 


In-Target Debug. Porting the DB960D 
retargetable monitor to your target system 
allows the debugger to be used in-target, thus 
facilitating debugging of code dependent upon 
hardware interaction. 
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ICE960SB IN-CIRCUIT 
EMULATOR 


ICE960SB is a full featured in-circuit emulator 
for the 80960SA and 80960SB components. A 
separate ICE probe can be purchased to | 

_ support 80960KA and 80960KB ap nonente 
ICE960SB includes: 


Full speed emulation of the 80960SA/SB_ 

components to 16 MHz 

¢ Complete symbolic information when used 
with Intel 80960 compilers 

e¢ 1024 Frames Bus or Execution Trace with 
Time-Tags 

¢ Comprehensive break éapabilities ineluding 
execution addresses, instruction type, bus 
read/write/access, data ee and external 
synch lines 5. 


WORLDWIDE SER VI CE, 
SUPPORT, AND TRAINING 


To augment its development tools, Intel offers 
a full array of seminars, classes, and 
workshops, field application engineering 
expertise, hotline technical support, and on- 
site service. 


Intel also offers a Software Siipport package | 


which includes technical software information, 


© Qualification of break conditions based on a 
8-state machine or an occurrence counter 

e Fastbreaks to dynamically access memory or 
variables during emulation | 

e Examine and modify memory and 80960 
registers _ 

° Stand-Alone Self-Test module provides 
diagnostic circuitry and 256 Kbytes of 
memory for software development 

e Optional 2 Mbyte of relocatable expansion 
memory 

© Support for socketed and mire mounted 84 
Pin PLCC components and surface mounted 
80 Pin EIAJ components via ONCE mode | 

e DOS Hosting with support for RS232 and 
RS422 communication links 


telephone support, automatic distribution of | 
software and.documentation updates, access to 
the ‘“ToolTalk” electronic bulletin board, 
“itComments” publication, remote diagnostic ° 
software, and a development tools | 
troubleshooting guide. | ; 


Intel’s Hardware Support package includes 
technical hardware information, telephone © 
support, warranty on parts, labor, material, 
and on- -site hardware support. 
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80960SA/SB DEVELOPMENT SUPPORT | | 


80960SA/SB DEVELOPMENT 


TOOLS 
ASM960 


C960 


GEN960 


DBSIM960KA 


DB9I60KBDEVA 


_ Assembler package 


containing the assembler, 
linker/loader, macro 
preprocessor, archiver, 
ROM image builder, other 
object file utilities, and the 
80960SA/KA/CA floating 
point arithmetic library. 


Optimizing C Compiler, 
with ANSI extensions for 
embedded control 
applications; contains 
standard STDIO libraries 
and in-line assembly 
capability. 


80960 System Generation 
software automates the 
compilation, assembly and 
linking process. Simplifies 
usage of 80960 sophisticated 
features. 


Debugging Simulator 
software emulates the | 
80960SA and 80960KA | 
instruction set allowing 
code development and 
debugging prior to 
hardware prototype 
availability. 


Source Level Debugger 
software for the 80960KB/ 
KA with powerful debug 
capabilities including 
conditional breakpoints, 
source and Call stack | 
browsing, memory/register 
display and modification, 
and ability to watch 


_ variables change value. 


Requires EVA-960KB4MB 
Software Execution 
Vehicle. For PC AT hosted 
systems only. 


DB960D 


EV A960KB4MB 


EVA960KB16MB 


ICE960SB 
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Source Level Debugger 
software for 80960SA/SB, - 
80960KA/KB, or CA 
processors resident on 
serially-interfaced 
hardware prototype 
systems. Includes 


customizable system debug 


monitor and serial interface 


protocol specifications. For 


PC AT hosted systems only. 


Software Execution Vehicle 
for 80960SA/SB and | 
80960KA/KB components. 
Includes 4 Mbyte of on- 
board memory, system 
debug monitor and code 
download software. Code 
compatible with the 
80960SA/SB components. 
Required by . : 
DB960KBDEVA. 


Identicalto. 
EVA960KB4MB with 

16 Mbyte of DRAM instead 
of 4 Mbyte. | 


In-Circuit emulator for the 
80960SA/SB components. 
Includes ICE base and 
probe, stand-alone self-test 
module, and your choice of 
PLCC or PQFP target 
adapters. Optional 2 Mbyte 
relocatable expansion 
memory option provides 
overlayable memory for 
software prototyping and 
hardware debugging. 


In 
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80960SA/SB DEVELOPMENT SUPPORT || | 


ARCHITECTURE EVALUATION 
STARTER KITS — | 
960SKit3 Contains ASM960D Assembler 

: and iC960D Compiler 


Kit contains DB-960KBDEVA 
(KB version of DB-960 used with 
_EVA-960), EVA960KB4MB 
_ Software Execution Vehicle, 
ASM960D and C960E. Requires 
PC AT with 640K memory. 


DB960KIT2 


DB960KIT3 Kit contains DB-960D (serial 
version of DB-960 supporting the 
80960SA/SB, 80960KA/KB and 
80960CA components (operating 
on PC-AT/DOS), ASM960D and 
C960D. Requires PC AT with 

. 640K memory. 


| ' . . ~ Product Code to order, by Host . 


Product 
Category 


PC-AT/DOS _ | UNIX-386 


V.4 


ASM960D 
-C960D 
GEN960D 
DB960D . 
DB960KBDEVA 
DB9I60D 
DB960CADIC 
DB960D | 
SIM960CAD 
ICE960SB 
ICE960KB 


_| Assembler 
C Compiler 
System Gen 
SX Debugger . 
KX Debugger 


CA Debugger 


-1SA Simulator | 
CA Simulator 

ICE960SB 
ICE960KB 


OS/2 


Sun 3/ 
UNIX 


HP9000/ 
| HP-UX 


VAX/ 
ULTRIX 


uVAX/ 
ULTRIX 
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ICETM-960SB AND ICE-960KB 
IN-CIRCUIT EMULATOR 


INTERCHANGEABLE PROBES 


The ICETM-960 in-circuit emulator delivers real-time hardware and software debugging 
capabilities for i960TM SA/SB and i960 KA/KB-based designs. Features include full- 
speed emulation of each of the microprocessors, powerful breakpoint specification, 
fastbreaks, optional relocatable expansion memory, two types of trace capability, large 
trace buffering, sophisticated human interface and high-speed communication links with 
the DOS host. The ICE-960 in-circuit emulator gives you unmatched control over all 
phases of hardware/software debug, including developing, integrating and testing, which 
improves development productivity and improves time to market. 


FEATURES 


° Real-Time Emulation of the i960 KA/KB 


microprocessors up to 25 MHz and 
emulation of the i960 SA/SB to 16 MHz 

° Full symbolic integration with Intel 
ASM and C compilers 

e Optional ICE960KBREM/ 
ICE960SBREM boards provide 2 Mbytes 
of ICE memory which can overlay user 
ROM or RAM. 

° Examine and modify memory and the 
i960 registers | 


280852-1 


° Dynamically monitor and update 


program variables via fastbreaks 


° Breakpoint capabilities include: 


execution address, instruction type, bus 
read/write/access, and data value. 
Qualification of events is based on an 
occurrence counter and an 8-state states- 
machine . | 
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FEATURES | 


¢ Hosted on IBM PC AT* or compatible and 
supporting RS232, RS422 and Ethernet 
operation 

1024 frame trace buffer for execution and/ or 
bus trace and time tags 

The on-chip cache does not effect collection of 
the execution trace _ 


e 256 Kbytes of memory in standalone self-test 


(SAST) unit 

Real-time bus trace with time-tags for | 

tracking code execution time. 

Assembly and disassembly of code i in i960 

instruction mnemonics | 

° ICE to component interconnect jellies 
support for surface-mounted and socketed 84- 
pin PLCCD and surface mounted 80-pin 
EIAJ QFP i960 SA/ SB and a PGA for 
i960 KA/KB 


The ICE-960 in-circuit emulator ens 
emulation of the i960 SA/SB at speeds to .. . 
16 MHz and the 1960 KA/KB at speeds to 

25 MHz, thus providing early detection of 
subtle timing problems that may arise at full 
speed. Intel’s intimate knowledge of the 
component makes possible the tightest _ 
conceivable conformance between timing — 
parameters of the emulator and the target 
microprocessor. 


PROCESSOR/MEMORY 
EXAMINATION AND 
MODIFICATION 


The i960 registers can be accessed _ | 
mnemonically (e.g. g12, r5, fp3) with the ICE- 
960 emulator software. Data can be displayed 
or modified:in hexadecimal, decimal, octal, or 
binary and by data type (byte, word, etc). 
Program memory contents can be modified as 
i960 assembly instruction mnemonics. 


PROGRAM TRACING 


The ICE-960 emulator can store 1024 frames of 
program execution history processor/address/ 
data bus activity in the trace buffer. Each : 
frame of program execution containsa _ 
discontinuity address (branch, call, return, etc) 
and a time-tag. This information can be used to 
reconstruct a history of the program execution. 
With the execution trace option enabled, the 
ICE-960 will run at less than full speed. Each 
trace frame of bus cycles contains one complete 
bus burst trace. Collection of trace information 
is controlled by a logic analyzer type moving 
trace window and by bus access type. 


° 


EVENT RECOGNITION 
(BREAKPOINT CONTROL) AND 
EMULATION CONTROL 


ICE-960 provides comprehensive event 
- recognition capabilities including: two 


hardware and thirty-two software breakpoints 
for instruction execution breakpoints, and use 


_of the internal debug registers to recognize 


execution of certain instruction types such as 
branch or call instructions. Bus analysis logic 
provides recognition of external bus addresses 
qualified by read, write, or access type as well 
as data values. The data values may be entered 
as masked values and qualified by type. Two 


_ synchronization lines are provided for 
~ recognition of external events. ICE-960 also 
_ provides qualification of events based on an 


occurrence counter or by a recognition 
sequence of up to 8 events: Additionally, 
emulation can be automatically stopped when 
the trace buffer is full. Besides the ability to 
execute program code at fullspeed between 
specified points, the ICE-960 emulator provides 


the aa to SESS -step through program 
code. 


RELOCATABLE EXPANSI ON 
MEMORY : 


An optional board provides ICE-960 with 2 
Mbytes of relocatable expansion memory 
which allows users to develop applications 
either before the target system memory is_. 
working, or in place of ROM or EPROM to. 
speed the debugging cycle. This memory can be > 
mapped in two separate 1 Mbyte partitions on 
1 Mbyte boundaries. 


For the new ICE960KBREM board, the 
memory waitstate pattern is (3,1,1,1) when the 
users system does not return RDY # for . 
accesses in the mapped area. For accesses 
where the user system does return RDY # for 
these areas, the waitstate pattern will be the 
larger of (3,1,1,1) or user waitstate pattern plus 
(2,2,2,2). For either board, the size and shape of 
the board is identical to the ICE probe and is 
installed between the probe and the user’s 
target system when in use. The memory 
configuration can be peep pes via an ICE MAP 
command. 


The ICE960KBREM/ICE960SBREM cards add 
some constraints when used with the ICE ina 
users target system. First, users should qualify 
bus drivers/buffers with DEN # in order to 
eliminate potential bus conflict between the 


REM board and their target memory while 
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| FEATURES | | 


using the ICE. Second, the 1M Byte partition 
size can not be reduced and may effect the 
design of the users memory subsystem. Third, 
the REM boards delay the ADS# and DEN # 
signals by 5 ns (typical) and delays the RDY # 
signal by 4 ns (typical). Fourth, it adds loading, 
capacitance, and power requirements as shown 
in tables 3 and 4. 


STANDALONE OPERATION 


Product software can be developed and 
debugged prior to and independent of 
hardware availability with the Standalone Self 
Test unit (SAST), which contains 256 Kbytes of 
two wait-state program memory. The SAST 
also provides diagnostic testing to assure full 
functionality of the ICE-960 emulator. 


VERSATILE AND POWERFUL 
HOST SOFTWARE | 


ICE-960 provides an easy-to-use human 
interface which utilizes color forms to 
complement a powerful command set. The 
software includes: an on-line help facility, a 
dynamic command entry and syntax guide, 
screen oriented editor, assembler and 
disassembler, input/output redirection, 
command piping, DOS command entry, and 
the ability to customize the command set via 
debug procedures and literal definitions. 


DEBUG PROCED URES AND 
LITERALS 


Debug procedures (PROCs) are user-defined 
groups of ICE960 emulator commands. They 
can be stored on disk and recalled during later 
debugging sessions. PROCs can be used to 
simplify the process of debugging by grouping 
repetitive emulator commands, which can then 
be accessed by typing the name of the PROC. 
Literals are user-defined abbreviations for 

_ whole or partial ICE-960 emulator commands. 
Literals are a shorthand method of 
customizing the emulator commands to fit 
your needs and preferences. _ 


ICE TO COMPONENT 
INTERCONNECT SYSTEM 


Using the On-Circuit Emulation (ONCE) i960 
SA/SB silicon feature, ICE960SB can be used 
in systems with surface-mounted i960 SA/SB 
components in either PLCC or EIAJ QFP 
packages. The hinge cable adapters included in 
the various ICE kits and pictured to the right, 
are placed directly on top of the surface 
mounted i960 SA/SB device. The circuitry 
necessary for the emulator to take control 
from the target processor is fully supported in 
the emulator. No additional circuitry is 
required. 


Of course, socketed support for i960 SA/SB 
components in PLCC packages, or 1960 KA/KB 
components in PGA packages are also 
supported. Please see Figures 1, 2, 3, and 4 for 
ICE Probe physical characteristics. Refer to 
Table 5 for hinge cable loading and delay 
characteristics. 


WORLDWIDE SERVICE, 
SUPPORT, AND TRAINING 


To augment its development tools, Intel offers 
a full array of seminars, classes, workshops, 
field application engineering expertise, hotline 
technical support, and on-site service. 


Intel also offers a Software Support contract 
which includes technical software information, 
automatic distributions of software and 
documentation updates, (COMMENTS 
publication, remote diagnostic software, and a 
development tools troubleshooting guide. 
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HIGH-SPEED HOST-TO-ICE 
COMMUNICATIONS ss 
PROTOCOLS | oe 


ICE-960 supports RS232 and RS422 — | 
communications protocols to 115 KBaud and | 
1152 KBaud respectively depending upon the 
ability of the host to support the specific rate. . 
Testing for these systems and the © 
configurations involved are described inthe | 
following sections. | 


| SPECIFICATIONS 


HOST REQUIREMENTS 


IBM PC-AT (minimum requirements) with 640 | 


KBytes of conventional memory ~ 


1 MByte of RAM (Lotus, Intel, Microsoft 
expanded memory specification) » 


_ 20 MByte Fixed Disk 

At least one 5’/,” or 3’/,” Floppy Disk drive 
RS232 or RS422 Communication Interface 
DOS Operating System (version 3.2 or 3.3) 


TESTED HOST _.. ee 
CONFIGURATIONS | oo 
IBM PC-AT with DOS 3.3. Tested with built- 

in RS232 and a Quatech DS202 Asynchronous 
RS422 Communications Board with 16550 
Option 


MECHANICAL SPECIFICATIONS 


TABLE 1. ICE-960 Emulator Physical Characteristics _ 


*Measurement includes target adaptor 


Intel’s 90-day Hardware Support package 
includes technical hardware information, 
warranty on parts, labor, material, and on-site 
hardware support. mn - ee 


Intel Development Tools also offers a 30-day, 
money-back guarantee to customers who are 
not satisfied after purchasing any Intel 
development tool. : 


COMPAQ Deskpro 386* with DOS 3.3. 
Tested with built-in RS232 and Quatech DS202 
Asynchronous RS422 Communications Board 
with 16550 Option | 


Systems Based on an Intel 301/302™ Box.. 
with DOS 3.3. Tested with built-in RS232 to _ 
115.2 KBaud and a Quatech DS202: ; 
Asynchronous RS422 Communications Board 
with 16550 Option to 1.152 MBaud — 


IBM Personal System/2* with DOS 4.01. :, 
Tested with built-in RS232 0 ) 
REQUIRED SYSTEM 
RESOURCES. sts 
The ICE-960 emulator requires the following: © 
a) exclusive use of the i960 SA/SB or i960 KA/ 
KB’s on-chip debug registersandb)a’' 
minimum of 256 bytes of target system RAM 
used to flush the i960 local registers. 


bo . 
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SIOEVIEW 


TOPVIEW 


280852-2 


Figure 1: ICE960KB25 Processor Module 


SIDE VIEW 


TOP VIEW 


[ee 5.100 a eel 


280852-3 


Figure 2: Optional Isolation Board 
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SPECIFICATIONS 


PLCC Hinge Cable Dimensions . 


Underside of 
PGA Socket 


PLCC Footprint | 


Side View 


oer 


Side View 


aon eas for vas 


| | - Mount Components 
All Measurements in Centimeters 
280852-4 


Figure 3: ICE960SB16C Adapter 
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SPECIFICATIONS 


ELECTRICAL SPECIFICATIONS AC/DC Specifications 


SYNC Line Specification The Optional Isolation Board (OIB) isolates the 


ICE-960 probe from an untested user target 
The SYNCIN line must be valid for at leastone ystem. When the OIB is in use, the ICE-960 


instruction cycle because it is only sampled on AC and DC specifications differ from the i960 


bus access boundaries. The SYNCIN line is a : 
: ee microprocessor as shown below. When the OIB 
standard TTL input. The SYNCOUT line is is not installed, the ICE-960KB timing 


driven by zs TTL open collector with a 4.75 KO specifications are identical to those of the i960 
pull-up resistor component : 


TABLE 2. AC Specifications with the OIB Installed 


16MHz + 25 MHz 
Parameter 80960SB . 80960KB - 


Clock Period 


el i 
Cee | Clock Low Time _ 

Clock High Time 
ta Geek Falline 
a5 | Goaktise 


Output Valid Delay 
A(2:3), BE#(0:1), BLAST #,* 

DEN #, DTR#, WR#** 

A/D Lines*** 


T6AS | AS Valid Delay (AS#) 
ALE# Width 
ALE# Valid Delay 
HLDA, INTO#, INT1, INT2, INT3# 


HOLD, READY #, LOCK # 


T12 | Input Setup 2 | 
HOLD, READY #, LOCK # 


Output Float Delay 
A(2:3), BE #(0:1), BLAST #,* 
DEN #, DTR#, WR#** 

A/D Lines 


Input Hold 


*Tp_y dependent on termination for KB control signals 
**OIB does not float A/D bus during T, and T; (between bus cycles) 
***Output Valid Delay for control signals after HOLD ACKNOWLEDGE is deasserted 50 ns for 80960SB and 43 ns for 80960KB 
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TARGET SYSTEM DESIGN 8) [SA/SB/KA/KB/MC] 
CONSIDERATIONS | To guarantee timings, the ICE requires 
In addition to the mechanical, power +5% supply voltage to the target sys stem 
consumption, and signal loading |. _  -(.e., ICE probe power). 


considerations for the ICE probe, the following 4) [SA/SB] 


__ points should be taken into account when the 


target system is being designed: , To ensure correct bus trace the ICE requires 


| a data hold time (T11) of 4 ns. 
LD [SA/ SB/KA/KB/MC] 


cee incetic ane aan 5) [SA/SB/KA/KB/MC] __ 
e us should not be driven by an | | oe? 
external source unless DEN # is asserted. _ Each Vcc and GND pin of the processor 


must be connected to the appropriate 
2) [SA/SB/ KA/ KB/ MC] voltage or ground and externally strapped — 


The LOCK # signal must be terminated as. close to the package. 
recommended in the 80960SA/SB . 6) [SA/SB/KA/KB /MC] 
component data sheet. 


Processor no connect (N. C. ) ping must be 
left disconnected. . 
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TABLE 4, Additional DC Loading 


Max Max. Max Max Max Max Max Max 
25 A Driven by 7448760 
eee wee ee 
DIVR# [25 yA] WAL wspA] yA; | | 
—565 pA rs ea a 
25 wa ee eae 
a 
aa 
a 


ae 

Les fee! 

a ie ee a 
faow | 25ya[ 25a] 45na|—r5onal | —*i sdY 
ee AME SO SNE 
TABLE 5. 80960SB PLCC Hinge Cable Loading and Delay 


Signal Loading | 15 pF Typical | 
Signal Delay | Signals from Processor delayed 4 ns typical, Setup and Hold Timings unaffected 
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ORDERING INFORMATION | ~—_- 


Order Code 
ADPT80EIAJ 


ADPT84PLCC 
ICE960SB16C 


ICE960SB16J 


ICE960KB25 


Description 


Hinge Cable Adapter for 
surface-mount i960SB EIAJ 
QFP packages. This adapter is 
included in the ICE960SB16J 
kit. 

Hinge Cable Adapter for 
surface-mount and socketed 
i960SB PLCC packages. This 
adapter is included in the 
ICE960SB16C kit. 


ICE960 base, i960 SA/SB 


_ probe, 84-pin PLCC surface- 


mount and socketed target 
component interconnect, and 
RS232 and RS422 | 
communication cables. 
(Shrink-Wrap license, Class 1) 


ICE960 base, i960 SA/SB 
probe, 80-pin EIAJ surface- 
mount target component _ 
interconnect, and RS232 and 
RS422 communication cables. 
(Shrink-Wrap license, Class 1) 


ICE960 base, i960 KA/KB 
probe, 132-pin PGA target 


Order Code 


Description 


ICE960SBREM Optional 2 MByte Relocatable 


Expansion Memory Board for 
i960 SA/SB components. — 


ICE960KBREM Optional 2 MByte Relocatable 


PTOI960SB16 


PTOI960KB25 


component interconnect, and | 


RS232 and RS422 
communication cables. 


- (Shrink-Wrap license, Class 1) 
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Expansion Memory Board for 
80960KA/KB components. 


Probe and Software to convert 
ICE960KB25 to ICE960SB16. 
An ADPTS80EIAJ or 
ADPT84PLCC adapter kit 
should also be ordered with 
this package to support the 
component packaging type of 


your choice. (Shrink-Wrap 


license, Class 1) 


Probe and Software to convert 
ICE960SB16C or 
ICE960SB16J to ICE960KB25. 


(Shrink-Wrap license, Class 1) 


SPELT ES Be OE TS 


ae 


By 
fe 


IN-CIRCUIT EMULATOR FOR THE 80960MC 


MICROPROCESSOR 


280899~1 


The ICETM-960MC In-circuit Emulator delivers real-time hardware and software 
debugging capabilities for 80960MC based designs. Features include emulation of the 
80960MC microprocessor, powerful breakpoint specification, fastbreaks, optional 
relocatable expansion memory, two types of trace capability, large trace buffering, 
support of virtual and physical component addressing modes, and sophisticated human 
interface. The ICE-960MC In-circuit Emulator gives you unmatched control over all 
phases of hardware/software debug, including developing, integrating and testing, which 
improves development productivity and speeds time to market. 


FEATURES 


e Real-Time Emulation of the 80960MC 
microprocessors up to 20 MHz (25 MHz 
optional) 

Full Symbolic Information Relating to 
Code. Data symbolics subject to some 
limitations in virtual addressing mode 
Optional ICE960KBREM Board Provides 
2 Mbytes of ICE Memory Which Can 
Overlay User ROM or RAM. 

Zero wait-state operation from user 
memory . 


e Examine and modify Memory,and the 
80960 Registers 
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° 


° 


Breakpoint Capabilities include: 
Execution Address, Instruction Type, 
Bus Read/Write/ Access, and Data 
Value. Qualification of Events is Based 
on an Occurrence Counter and an 8 state 
State-Machine 

Hosted on IBM PC AT or compatible 


° Dynamically monitor or update program 


variables or memory during emulation 
with Fastbreaks 

1024 Frame Trace Buffer for execution 
and/or Bus Trace and time tags 

256 Kbytes of Memory in Standalone 
Self-Test (SAST) Unit 


November 1990 
Order Number: 280899-001 


ICETM-960MC IN-CIRCUIT EMULATOR 


REAL-TIME EMULATION 

The ICE-960MC In-circuit Emulator provides 
emulation of the 80960MC at speeds up to 20 
MHz (25 MHz optional), thus providing early 
detection of subtle timing problems. Intel’s 
intimate knowledge of the component makes 
possible the tightest conceivable conformance 
between timing parameters of the emulator 
and the target microprocessor. . 


PROCESSOR/MEMORY 
EXAMINATION AND © 
MODIFICATION 


The 80960MC registers can be accessed 
mnemonically (e.g. g12, r5, fp3) with the ICE- 
960MC emulator software. Data can be 
displayed or modified in one of four bases — 


(hexadecimal, decimal, octal, or binary) and by 


data type (byte, word, etc). Program memory 


contents can be disassembled and displayed as : 


80960 assembly instruction mnemonics. 
Additionally, 80960 assembly instruction | 
mnemonics can be assembled and stored into 
program memory. 80960MC system data - | 
structures such as the segment table, dispatch. 


port, and page tables can also be accessed and | 


modified mnemonically. 


PROGRAM TRACING 


The ICE-960MC emulator can store 1024 

_ frames of program execution history or 1024 
frames of the 80960MC address/data bus 
activity in the trace buffer. Each frame of 


program execution contains a discontinuity — 


address (branch, call, return, etc) and a time- 
tag. This information can be used to 


reconstruct a history of the program execution. 


With the execution trace option enabled, the © 
ICE-960MC will run at less than full speed. 
Each trace frame of bus cycles contains one 
complete bus burst trace. Collection of trace 
information is controlled by a logic analyzer 
type moving trace window and by poe access 
type. : 


EVENT RECOGNITI ON eas 
(BREAKPOINT CONTROL) AND 
EMULATION CONTROL | 


ICE-960MC provides comprehensive event 
recognition capabilities including: two 
hardware and thirty-two software breakpoints 
for instruction execution breakpoints, and use 
of the internal debug registers to recognize 
execution of certain instruction types such as 


branch or call instructions. Bus analysis logic 


_ provides recognition of external bus addresses 

. qualified by read, write, or access type as well 
~ as data values which may be entered as 

masked values: Two synchronization lines are 


provided for recognition of external events. 


_ ICE-960MC also provides qualification of © 


events based on an occurrence counter or by a 
recognition sequence of up to 8 events. Special 
additions for the 80960MC include the ability 
to recognize process binds. Additionally, 
emulation can be automatically stopped when 
the trace buffer is full. Besides the ability to 
execute program code at full speed between 
specified points, the ICE-960MC emulator 


_ provides the capability to single-step ee 
‘program code. — : 


RELOCA TABLE EXPANSI ON 
MEMORY 


An optional board provides ICE-960MC eat 2 
Mbytes of relocatable expansion memory 
which allows users to develop. applications 
either before the target system memory is 
working, or in place of ROM or EPROM to 
speed the debugging cycle. This memory can be 


. mapped in two separate 1 Mbyte partitions on 


1 Mbyte boundaries. The memory waitstate 
pattern is (3,1,1,1) when the user’s system does 
not return RDY # for accesses directed to the 
ICE960KBREM board. For accesses where the 
user system does return RDY # the waitstate 
pattern will be the larger of (3,1,1,1) or user 
waitstate pattern plus (2,2,2,2). The size and 
shape of the board is identical to the ICE probe 
and is installed between the probe and the 
user’s target system when in use. The memory 
configuration can be mapped via either an ICE 
MAP command or via switches on the 
ICE960KBREM board. 


The ICE-960KBREM card adds some » 
constraints when used with the ICE in a user’s 
target system. First, users should qualify bus 
drivers/buffers with DEN # in order to 
eliminate potential bus conflict between 
REM$960 and their target memory. Second, the 
1 Mbyte partition size can not be reduced and 
may effect the design of the user’s memory 
subsystem. Third, ICE960KBREM delays the © 
ADS# and DEN # signals by 5 nsec (typical) 
and delays the RDY # signal by 2 nsec (typical). 
Fourth, it adds loading, capacitance, and power 
requirements as shown in tables 3 and 4. 
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ICKTM-960MC IN-CIRCUIT EMULATOR | 


STANDALONE OPERATION 


Product software can be developed and 
debugged prior to and independent of 
hardware availability with the Standalone Self 
Test unit (SAST), which contains 256 Kbytes of 
two wait-state program memory. The SAST 
also provides diagnostic testing to assure full 
functionality of the ICE-960MC emulator. 


VERSATILE AND POWERFUL 
HOST SOFTWARE 


ICE-960MC provides an easy-to-use human 
interface which utilizes color and pull-down 
menus to complement a powerful command 
set. The software includes: an on-line help 
facility, a dynamic command entry and syntax 
guide, screen oriented editor, assembler and 
disassembler, input/output redirection, 
command piping, DOS command entry, and 
the ability to customize the command set via 
debug procedures and literal definitions. 


Special software commands are provided to 
display, interpret, and modify the 80960MC 
hardware data structures including the 
segment table, dispatch port, process control 
block, and the page tables and directories. 


DEBUG PROCEDURES AND 
LITERALS 


Debug procedures (PROCs) are user-defined 
groups of ICEK-960MC emulator commands. 
They can be stored on disk and recalled during 
later debugging sessions. PROCs can be used to 
simplify the process of debugging by grouping 
repetitive emulator commands, which can then 
be accessed by typing the name of the PROC. 
Literals are user-defined abbreviations for 
whole or partial ICE-960MC emulator 
commands. Literals are a shorthand method of 
customizing the emulator commands to fit 
your needs and preferences. 
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| ICET-960MC IN-CIRCUIT EMULATOR | 


WORLDWIDE SERVI CE, telephone support, automatic distribution of — 


SUPPORT, AND TRAINING | software and documentation updates, access to 
the ‘ToolTalk” electronic bulletin board, 


“tComments” publication, remote diagnostic 
software, and a development tools | 
troubleshooting guide. 


To augment its development tools, Intel offers 
a full array of seminars, classes, and 
workshops, field application engineering | 


expertise, hotline technical support, and on- | | 
site service. | | Intel’s Hardware Support package includes 


technical hardware information, telephone 
support, warranty on parts, labor, material, 
and on-site hardware EUDROFE: 


Intel also offers a Soewies Support eis 
which includes technical software information, 


SPECIFICATIONS 


HOST REQUIREMENTS _ _ REQUIRED SYSTEM 


IBM PC AT (minimum Eequiemant) with 640 RESO UR CES 7 | 

KB of conventional memory The ICE-960MC emulator requires athe | 

¢ 1 MB of RAM (Lotus, Intel, Microsoft _ following: a) exclusive use of the 80960MC’s on- 
expanded memory specification) , ~* chip debug registers and b) a minimum of 256 

e 20 MB Fixed Disk | = bytes of target system RAM used to — a 

e At least one 5-'4," Floppy Disk drive 80960 local registers. | | 


| © A serial interface 
¢ DOS Operating system (version 3.2 or later 
excluding 4.x) 


Mechanical Specifications 


TABLE 1. ICE-960MC Emulator Physical Characteristics 


Width Height Length Weight 
Inches cm Inches cm Inches cm _—_ Ibs kg 


6.0 


Unit 
Control unit 
Processor module* =: 3.8 9.6 1.5 3.8 5.0 12.7 
SAST | 6.0 15.2 2.0 5.1 8.0 20.3 3.5 1.59 
OIB 3.8 9.6 9 2.3 5.1 13.0 | 

~ Power supply 2.8 7.1 4.2 10.7 11.0 27.9 4.7 2.14 
User cable 
Serial cable 


“measurement includes target adaptor 
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ICKTM-960MC IN-CIRCUIT EMULATOR 


SIDEVIEW 


a: = ae i 


TOPVIEV - 


PROCESSOR MODULE 
280899-2 


Figure 1: Processor Module 


\ | _ SIDEVIEW 


oe I 


TOPVIEW 


OPTIONAL ISOLATION BOARD 


280899-3 


Figure 2: Optional Isolation 
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SPECIFICATIONS 


ELECTRICAL SPECIFICATIONS AC/DC Specifications 
SYNC Line Specificati The Optional Isolation Board (IB) isolates the 
ine Specification 


ICE-960MC probe from an untested user target 
The SYNCIN line must be valid for at least one system. When the OIB is in use, the ICE- 


instruction cycle because it is only sampled on - ; : 
instruction boundaries. The SYNCIN line is a 960MC AC and DC specifications differ from 
the 80960MC microprocessor as shown below. 


standard TTL input. The SYNCOUT line is : 
; , | When the OJB is not installed, the ICE-960MC 
driven by a TTL open collector with a 4.75K- specifications are identical to those of the 


ohm pull-up resistor. 80960MC component. 


TABLE 2. AC Specifications With The OIB Installed 


Symbol’* eo _ Parameter Minimum Maximum 


clock low time 2+1nS 
clock high time 3+1ns 
output valid delay | 

A/D 0:31 6+8ns t6+ 16Ns 
DT/R#, DEN #, BE0O-34#, ADS#, W/R# 7 | 6+ 7nS t6+ 14ns 
HLDA, CACHE, LOCK #, INTA# 6+ 6ns t6+ 8nS 
ALE# © | 6+10nS t6+20nS 
ALE# width | | | 7—6.5nS 
ALE# disable delay 8+nS t8+14nS 
output float delay | 
A/D 0:31 | t9+5nS = t9 +. 22nS 
DT/R#, DEN #, BEO-34, ADS#, W/R# | t9+7nS t9+ 15ns 
HLDA, CACHE, LOCK #, INTA# t9+6nS t9+8nS 
input setup 1 
A/D 0:31 — t10+2nS 
BADAC#, INTO-3# deassertion t10+14nS 
input hold | 
A/D 0:31, HOLD t11+6nS 
BADAC#, INTO-3#, 
READY # t11+7nS 
reset setup time . 16+6 


*symbol refers to 80960MC specification 


TABLE 3. ICE-960MC Emulator DC Specifications 


Symbol* Parameter Maximum 


PM-Icc Supply current with 80960KB-20 1400mA 
OIB-Icc Supply current PM-Icc + 1100mA 
REM-Icc Supply current PM-Icc + 1800mA (1700 Total Typical) 
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SPECIFICATIONS 


(without (with (with 


| 


AD (0:31) 
ADS# 

DEN # 
W/R# 
CLK2 
RESET 

BE (0:3) # 
READY# | 
ALE# | 


OIB installed) 


Tih 


100 vA 
140 uA 
40 uA. 
140 uA 
80 uA 


lil 


OTB installed) 
lih Til 


REM installed) 


Tih 


Maximum Maximum Maximum Maximum Maximum 


120 uA 


lil 


Maximum 


0.7 mA 


Driven by 74AS760 
w/ 4.7k pull-up 


150 vA 
130 uA 
250 uA 
10uA 
750 uA 
20 uA 


1.7mA 
2.9mA . 
0.3mA 
0.1mA 
0.8 mA 
0.5mA 


DT/R# 
INTO#, INT3# 
INT1, INT2_ 
BADAC# 
LOCK # 

HOLD 
FAILURE# 
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SPECIFICATIONS 


POWER SUPPLY 


-100-120V or 220-—240V (Selectable) 
50-60 Hz 

2 amps (AC Max) @ 120V . 

~ lamp (AC Max) @ 240V 


ENVIRONMENTAL 


CHARACTERISTICS _ 


| peralins Temperature 10°C to 40°C 
(50°F to 104°F) 


3 Oparstnk Humidity Maximum 85% 


Relative Humidity, : 


non-condensing 


ORDERING INFORMATION 
Order Code _— Description | 


ICE960MC . Thecomplete 20 MHz ICE- 
960MC emulator system 
including control unit, 
processor module, power 
supply, SAST, OIB, SAB, 
serial communications cable © 
(SCOM4), IEDIT, V1.0 | 
software. (Requires software 
license, Class I) 


. JCE960MC25P 25 MHzICE960MC as _ 


described above 


I960MCUPG Conversion kit to convert 
ICE-960KB to ICE-960MC.. 
Consists of new host and 
probe software, probe _ 
firmware, and manual.. . 
Requires ICE-960KB V2. Oo 
hardware. 


ICE960KBREM Optional 2 Mbyte Reise. AbIS . 
Expansion Memory Board. . 
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QT960 EVALUATION AND PROTOTYPING 


270743-1 


LOW COST EVALUATION TOOL 


The QT960 products give you a 32-bit starter kit to begin software evaluation and 
hardware design at a low cost. The boards feature the 20 MHz 80960KB 32-bit embedded 
processor. The 80960KB has integrated floating point, instruction and register caches, 
and an on-chip interrupt controller. The 80960K-series are the first in a new | 


architectural family of embedded processors from Intel built using Intel’s CHMOS IV? 


process. These boards provide you with full access to the features of the 80960KB 
processor. A wire wrap prototyping area offers you easy access to board features to test 
your designs. Interleaved EPROM means fast execution of your code taking advantage of 
the 80960KB’s burst bus. A programmable wait state generator simulates different 
memory environments useful in evaluating the performance of your code. These features 
make the QT960 boards useful low cost tools for the 32-bit embedded designer. 


Once written, you can debug your program with NINDY, an EPROM resident’ debug 
monitor. NINDY enables you to download code, set seven different trace modes, display 
and modify memory or registers, and disassemble problem code sequences. 


Available separately from Intel are the ASM-960 (assembly language) and iC-960 (high- 
level language) products which provide you with the code move cpinenee environment for. 
the QT960 boards. 


The starter kit comes in two versions: the QTIGOF version has fast SRAM, high speed 
EPROM and Flash memory; the QT960E version has lower cost SRAM, Flash memory 
and no high speed EPROM. Each version has NINDY in either EPROM (QT960F) or 
Flash memory (QT960E), power supply cable, and the QT960 User Manual. Both versions 
also include the parts list, source code of the debug monitor, and the board data base 
(schematics) all on diskette. Armed with this starter kit you now have a system to 
evaluate and prototype your product ideas quickly and at low cost. 


; November 1990 
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QT960 FEATURES | 

e 20 MHz Execution Speed _ © Display/Modify Memory and peer 

¢ 128K Bytes to Zero Wait State EPROM! .— ¢ Code Disassembly 

e 128K Bytes of Flash Memory | _..°@ High Level Language Support 

e 128K Bytes of Zero Wait State SRAM? © ¢ RS-232 Communications Link 

e Programmable Wait State Generator _ e The QT960E Version has 128K Bytes of Two 
© Prototyping Wire Wrap Area caste Oe Wait State SRAM and 128K sta Four Wait 
e Five Instruction Traces State Flash Memory 


¢ Two Hardware Breakpoints 7 
Product Order Codes: Evaro and EVQT960E20 


tCHMOS IV isa patented Intel process. 
+QT960F Version only. 


FAST AND EASY CODE UPDA TES 


_ 128K Bytes of Intel’s 28F256 Flash memory provides an easy and Ace method of changing your 
code in nonvolatile memory. Flash memory may be conveniently reprogrammed without 
removing it from the board while software i is under Ba aa 


FAST EPROM 


Interleaved fast EPROM (Intel’s 27 202) on the QTo6oF version sae, one-zero-zero-zero wait 
state code access. It efficiently utilizes the four. word burst capabilities of the 80960KB bus 
maximizing program performance. 


PROTOTYPING SUPPORT 


A prototyping wire wrap area is provided on board with access to the system’s signals and buses. 
This area gives you access to the board’s features and allows you to easily test design ideas. A: 
system bus connector is also provided for off board prototyping. | | 


PR OGRAMMABLE WAIT STA TE GENERA TOR 


A software programmable wait state generator enables you to accel. model various memory 
speeds. Under software control you can set over 16 different wait state combinations and evaluate 
the performance of your target system. : | 


DMA 


The board offers you eisht DMA dheeineie accessed Picoueli a NINDY library function using = 
Intel’s 82380. In addition, off board connectors provide DMA I/O capabilities. a. 


FIVE INSTRUCTION TRA CES AND TWO HARD WARE 
BREAKPOINTS 


NINDY utilizes the built-in trace capabilities of the 80960KB to orale you with single step, 
supervisor, call, return, and branch instruction tracing offering you extensive debug capabilities 
for software examination and modification. Two hardware breakpoints enable you to break on 
and examine EPROM resident code. 
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HIGH LEVEL LANGUAGE SUPPORT 


NINDY is capable of downloading absolute object code generated by ASM-960 or iC-960. ASM-960 
and iC-960 may be purchased separately from Intel. e 


COMMUNICATION AND SOFTWARE REQUIREMENTS 


The QT960 boards communicate with the host through the RS-232 link using an Intel 82510 
UART provided on board. The boards support five baud rates: 1200, 2400, 9600, 19200, and 38400. 
The default is 9600 baud. To communicate with the QT960 boards you must meet the following 
minimum software requirements: | 


° Terminal Emulator : ° XMODEM Download Capabilities 
WAIT STATE 
GENERATOR SYSTEM Seal 
SUPPORT 


: SECIDGEOKE CONTROLLER | 

| » | | | 1 | | wire 

| ADDRESS } | WRAP 

fn i LATCHES 1 an ) -7PINS 
i * ; p 


82380 8251 0 


80960KB 
CPU 


OFF 
BOARD 
CONNECTOR 


FLASH - 
MEMORY 


128K BYTES 


SRAM. 
128K BYTES | 


270743-2 


Block Diagram of the QT960 Board 


For information or the number of your nearest salés office call 800-548-4752 (U.S. and Canada). . 
Intel Corporaton, Literature Department, 3065 Bowers Avenue, Santa Clara CA 95051, United States. Tel: 408-987-8080. 
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DB9I60CADIC | 


Intel’s DB960CADIC, the in-circuit debug monitor for the 33 MHz i960CA embedded 
microprocessor, represents a new generation of development tool technology. © 


DB960CADIC allows users to debug high-speed, cached applications at the full speed of 
the i960CA target processor. Controlled by Intel’s DB interface, DB960CADIC offers the 
user a tool with a powerful feature set at a fraction of the cost of traditional development 
tools. DB960CADIC is designed to improve productivity by allowing the user to debug 
software before and after the target system arrives, with minimal hardware intrusion. 


“ 


Features 

¢ Real-time emulation of the i960CA ¢ Low-Cost 
embedded microprocessor at speeds upto °¢ Source-Level, Symbolic Debugging in a 
33 MHz , Windowed Human Interface with pull 

e Full development and debug support for down Menus (DB). This interface is 
i960CA on-chip cache and RAM consistent across i960CA tools. 

e¢ Minimal intrusive operation, allowing e 128K Bytes User Memory 
the user to debug the target system with  ¢ Virtual I/O, the ability to perform I/O 
minimal modification subject toinitial between the DB960CADIC unit and the 
design constraints host 

e Breakpoint capabilities include ten e In-Circuit operation facilitates easy 
software breakpoints, two hardware transition between target systems 
execution address breakpoints, and two e Optional Stand-Alone Self-Test 
hardware data address breakpoints. The (DB960CASAST) Module 
human interface supplements these ¢ Optional Logic Analyzer Interface Board | 
breakpoints with the ability to break on (LAI960CA) 


data values, conditions, and a four-state 
state machine in non-real time. 


June 1991 
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DBS60CADIC IN-CIRCUIT DEBUG MONITOR | 


Full-Speed Debug and 
Development 


The DB960CADIC In-Circuit Debug Monitor 
provides sophisticated real-time hardware and 
software debug capabilities for i960CA 

~ embedded microprocessor-based designs. The 
user can run at the full speed of the target 
processor, ensuring that elusive timing bugs 
will be found. The DB960CADIC is jumpered to 
receive a clock pulse from either the user’s 
target system, or from an internal 25 MHz 
clock. 


Ideal for All Stages of 


Development 
DB96GOCADIC can be used by both hardware 


and software developers, at any stage of design. 


Early in the development process, 
DB9I6OCADIC allows software debugging when 


inserted into an existing i960CA board such as ~ 


the DB960CASAST module or the EV80960CA 
board. Later in the design cycle, DB960CADIC 
can be inserted into the user’s target system, 
thus facilitating debug of hardware/software 
integration. 


Speed Development with Source 
Code, Symbolic Debugging 


Using source code oriented debugging in a 
windowed, symbolic interface, software 
engineers can increase productivity by 
debugging in the medium they are familiar 
with, software source. : 


Commands can be entered via either function 
keys, pull-down menus which group logically 
related commands, or a supplementary 
command line which allows entry of complex 
conditions. In addition, source code symbolics 
can be used to examine and modify memory 
and registers. Optimal symbolic debugging can 
be achieved when using DB960CADIC with 
genuine Intel languages. 


Powerful Break Capabilities 


DB960CADIC provides complex emulation 
control by utilizing the on-chip debug registers 
within the i960CA. Real-time break 
capabilities include the ability to break on any 
two execution addresses or data access 


addresses in hardware. Software breakpoints 
are also used to supplement the hardware 
breakpoints for RAM-based memory 
subsystems. DB960CADIC extends these » 
capabilities by providing the ability to break 
on data values, NOT data values, or 
combinations of the above in a four-state state 
machine. More complex conditions such as 
breaking when a variable is less than a certain 
value can be entered via a very flexible feature 
called conditional breakpoints. 


128K Bytes User Memory 


DB96O0CADIC provides the user with 128K 
bytes of memory in Region F of the i960CA 
target space. Since the debug monitor is also 
placed in Region F’,, the on-chip bus interface 
unit of the i960CA is configured to address 
region F as byte-wide memory with 5 
waitstates and no burst accesses allowed. 


Virtual Input/Output 


DB96O0CADIC is shipped with documented | 
library calls which provide users with a built- 
in mechanism of performing target I/O using 
the host system. These libraries provide the 
ability to simulate I/O operations in the target 
system before target hardware is available. 


High Speed Serial Link 


Communication between a host and the 


- DB96O0CADIC module is supported via RS232 


and RS422 communication links. RS232 allows 
access to industry standard serial protocols 
while the RS422 link provides a higher speed 
communication mechanism currently 
emerging in the development market. PC/AT 
Compatible RS422 communication boards are 
available from various third party vendors. 


Optional Stand-Alone Self Test 
Chassis 


An optional stand-alone self test chassis 
complements DB960CADIC by allowing the © 
user to debug and test code before prototype 


hardware is available. The DB960CASAST 


includes self-test circuitry to ensure that the 
DB9Y6OCADIC unit is working correctly. It also 
provides 4 Megabyte of DRAM to be used for 
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DB960CADIC IN-CIRCUIT DEBUG MONITOR | 


developing applications. This memory has a 
(3,1,1,1) waitstate pattern at 25 MHz. This 
waitstate pattern is programmable using the 
bus controller unit in the i960CA. It also 
includes an 8254 programmable timer which 
can optionally interrupt the i960CA processor | 
and provide the ability to time code sequences. 


Optional Logic Analyzer Interface 
Board 


The LAI960CA board provides access to 
i960CA pins by routing the signals to easily 
accessible stake pins while passing them 
through to the target system. 


Software Completes the System 


Intel provides a comprehensive software 
development environment to complement 
DB960CADIC. This environment includes C 
and ASM source languages, a retargetable _ 
debug monitor, and DB960CADIC. The _ 
languages support the entire range of 80960 
embedded processors. | 


Worldwide Service, Support, and 
Training 


To augment its development tools, Intel offers 
a full array of seminars, classes, workshops, 
field application engineering expertise, hotline 
technical support, and on-site service. 


Intel also offers a Software Support contract 
which includes technical software information, 
telephone support, automatic distributions of 
software and documentation updates, 
iCOMMENTS publication, remote diagnostic 
software, and a development tools 
troubleshooting guide. 


Intel’s 90-day Hardware Support package _. 
includes technical hardware information, 
telephone support, warranty on parts, labor, 
material, and on-site hardware support. 


Intel Development Tools also offers a 30-day, 
money-back guarantee to customers who are 
not satisfied after purchasing any Intel 
development tool. 


Host System Requirements 


Host system requirements to run the in-circuit 
debugger include the following: 


—DOS version 3.2 or later excluding DOS 4.0 
—640 bytes of RAM in conventional memory 
—A 20 MB hard disk | <7 
—An RS232 or RS422 Serial Port. 
Evaluated Systems include: 

IBM PC-AT* with DOS 3.3 

COMPAQ 386* with DOS 3.3 © 


Intel 301/302* with DOS 3.3 


IBM Personal System/2* Model 70/80 with 
DOS 4.01 


Environment Characteristics 


Operating panera: + 10°C to + 40°C 
(50°F to 104°F) 


Maximum of 90% 
relative humidity, 
non-condensing. 


Operating Humidity: 
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Figure 1 . 


Serial Cable 
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Figure 2 
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DB9I60CADIC IN-CIRCUIT DEBUG MONITOR 7 


DB9IGOCADIC In et Lea 
Considerations 


Target systems intended to receive © 
DB96OCADIC must meet the following 
requirements: 

¢ The target system must not respond to 
memory accesses in Region F (OF0000000- 
OFFFFFFFF) with DB960CADIC installed. 
DB960CADIC provides an ACTIVE out 
signal which can be used to qualify bus logic 
to prevent this occurrence when 
DB96O0CADIC is installed. 

e The Target System must provide 1.3 Amps of 
power (worst case) .9 Amps average to power 
the DB960CADIC unit. 

° Use of one of the nine directly accessible. 
i960CA interrupts. 

¢ Use of interrupt table entry 242 or 248. 

e Additional Signal Loading as follows: 


The DB960CADIC makes use of the PCLK 
outputs, DO through D7, and some of the © 
address and control signals of the processor. 
The following table lists the worst case 
loadings added by the presence of the 
DB960CADIC circuitry. 


Signal DC Load Capacitive 
Name (wA) Load (pF) © 


PCLK1 + 25/—250_ 


PCLK2 + 30/ — 255 
CLKIN Ze 12 

DO0:D7 _ +20/—600 
A31:26 + 25/ — 250 
A2:A17 + 20/ — 100 
BEO*, BE1* + 20/ —100 
ADS* + 50/ — 500 
W/R* + 50/—500 
WAIT* + 25/ — 250 
BLAST* + 25/ — 250 
FAIL* + 20/ — 20 

RESET* +15/-15 

INTO:7* + 20/ — 500 


NMI* + 20/ — 500 


Additional Loading ee on the Target = 
the DB960CADIC — 


Ordering Information 


DB960CADIC In-circuit debug monitor for 
| the i960CA embedded 

microprocessor. Operates at — 
speeds up to 33 MHz. Includes 
hardware debug module, 
RS232/RS422 serial cables, 
DOS host software, and 
documentation. ° 


DB960CASAST Stand-Alone Self Test Unit for 
DB96OCADIC. Includes built- 
in power supply, self-test 
board, 4Mbyte of usable | 
DRAM for code development, 
and enclosure. 


DB960CAST DB960CADIC and 
DB960CASAST as s described 
above. 


LAI960CA. ~ Optional Logic fetes 
| Interface Board for the 
i960CA system. Does not 
require DB960CADIC. 
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INTEL DEVELOPMENT TOOLS SOFTWARE 
SERVICES 


280921-1 


Intel is committed to providing high quality products and customer support. Our . 
commitment to quality is demonstrated by a 30 day, money-back, unconditional refund to 
customers not satisfied with their purchase of an Intel Development Tools product. 


Intel supports its customers by offering a 90-day software warranty and standard 
software support including free technical support over the phone. 


Intel software is continuously undergoing improvement. For customers who desire the 
security of having the most current software and the convenience of having updates sent 
automatically, Intel offers inexpensive Software Support Contracts. 


SOFTWARE WARRANTY STANDARD SOFTWARE 

The standard software warranty is 90 days SUPPORT 

and entitles the customer to the following Standard Software Support, provided at no 

(provided the customer has registered their additional cost, offers the following 

software by returning a completed additional benefits: 

Warranty Registration Card): ° Free Technical Information Phone 

°e Replacement of defective media — - Service (“TIPS') 

e Software product updates occurring ° Timely response to Software Problem 
within the 90 day warranty period. Reports 
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INTEL DEVELOPMENT TOOLS SOFTWARE SERVICES 


Software Support Contracts ___ @ Remote Diagnostic Software for DOS-based 
products. 
¢ Monthly issues of t(COMMEN TS, a technical 
support publication | 
e Quarterly issues of Proubleshoowng Guides 
_. (host-specific) © : | 
© Quantity discounts © 


Software Support Contracts cover products for 
one year from the date of purchase and are 
renewable annually. The following benefits are 
provided: 

e Automatic Software Updates 

¢ Standard Software Support 


ORDERING INFORMATION 


Ordering Procedures eo Pricing Information Fig 

For more information, call 1-800-468-3548 or Quantity discounts.are: | 
your local Intel sales office. Similar support He? a | 
offerings are available outside of North | OP roduct quanti yo oo P er . ook 4 
America. Software Support Contracts are _.. | 1-10 copies so 20% of List Price 
available for North American customers only. . 11-25 copies 15% of List Price 
All orders for contracts, including renewals, 26+ copies | ran 10% of List. Price 


can be submitted through the local Intel sales VAN anid MicroV AX software not included. 
office or directly to the Development Tools Pl ll 1-800-874-6835 f te 
Operation by calling 1-800-874-6835. kg, ee Se a 
To order a Software Support Contract,a . Ordering Ii nfe ormation © 
customer.must have registered their product 


ordercode . description fin : 
or provide proof of ownership. Customers must | | 
also have the most current version of the SWSUPPORTS1 Software Support Contract 
software, otherwise, they must order a product | for 51 family 
upgrade before a BUPPOMe contract may be SWSUPPORTS86_ Software Support Contract 
purchased. | : for 86 family | 
Pricing i isa percentage. of the List Price, based SWSUPPORT96_ Software Support Contract 
on the number of copies covered by the for 96 family 


Software Support Contract. For emulators, the. 
percentage will be applied to the identified list SWSUPPORT286 Software Support Contract 


price of the software portion only, not the full for 286 family 
list price of the emulator. SWSUPPORT386 Software Support Contract 
ne for 386 family | 


SWSUPPORT486 Software Support Ganiace 
for 486 family. 


| SWSUPPORT960 Software Support Contract 
: for 960 family 
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IRMKTM 960 
REAL-TIME KERNEL 


32-Bit Real-Time Multi-Tasking Kernel Requires Only an i960 KA, KB or MC 
for the i960™ Niicroprocessor Family © Embedded Processor 

Flexible, Modular Design to Ease Bus Independent | 
System Integration Easy Customization and Add-On 
Fast Execution with Predictable _ Enhancemenis 
Response Time for Time-Critical m Easily EPROMmable 
Applications 


Comprehensive Development Tool 


m Compact Code Size (14 Kbytes— Support 


Including All Optional Modules) 


The iRMK 960 Real-Time Kernel is the 32-bit real-time executive developed and supported by Intel, the i960 
architecture experts. The kernel is'a small, fast and highly modular package of system control software. It 
contains the basic software building blocks that act as the foundation in using the key features of the i960 
microprocessor. The iRMK 960 software is fully supported by an array of tools that work in the most popular 
development environments (i.e., DOS*, VAX/VMS*, SUN*). 


The iRMK 960 Real-Time Kernel is available off-the-shelf. The kernel reduces the cost and risk of designing 
and maintaining software for numerous real-time applications such as, embedded control systems and dedi- 
cated real-time subsystems in multiprocessor environment. Use of the kernel can save man years that might 
otherwise be spent developing or porting another real-time kernel. This means reduced time to market for the 
user. 


*DOS® is a registered trademark of Microsoft Corporation. 
VAX/VMS" is a trademark of Digital Equipment Corporation. 
SUN™ is a trademark of Sun Microsystems. 
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ARCHITECTURAL OVERVIEW 


At the heart of the architecture are the kernel core 
modules consisting of a scheduler, task manager, 
interrupt manager and time manager (See Figure 1). 
As additional building blocks, the kernel provides op- 
tional modules consisting of a mailbox manager, 
semaphore manager, memory manager, on-proces- 
sor interrupt controller manager and fault handler 
manager. The optional device. manager for the 
82380 Integrated System Peripheral (ISP) and 8254 
Programmable Interval Timer (PIT) complete the ar- 
chitecture. 


FUNCTIONAL FEATURES 


A Full Set of Real-Time Building Blocks 


The kernel provides a full set of services for real- 


IRWIKTM 960 


TASK MANAGEMENT 


The iRMK 960 kernel uses system calls to create, 


manage and schedule tasks in a multi-tasking envi- 
ronment. It provides pre-emptive priority scheduling 
combined with optional time- slice (round robin) 
scheduling. | 


The scheduling algorithm used by the kernel en- 
ables tasks to be rescheduled in a fixed amount of 
time regardless of the number of tasks. Applications 
may contain any number of tasks. 


An application. can integrate optional task handlers 
to customize task management. These handlers can 


execute on task creation, task switch, task deletion 


and task priority change..Task handlers can be used 
for a wide range of functions, including saving and 


~ restoring the state of coprocessor registers on task 


time applications including task management, time | 


management, synchronization of and communica- 
tions between tasks, and memory pool manage- 
ment. 


APPLICATION 


| LANGUAGE INTERFACE LIBRARIES 


KERNEL 
CORE 
MODULES 


KERNEL 


switch, masking interrupts based on task priority or 
Implementing statistical and magnostc monitors. 


| INTERRUPT MANAGEMENT © 


OPTIONAL 
MODULES 


iIRMK 960° interrupts are managed by eae 
switching control to user-written interrupt handlers 
when an interrupt occurs. 


USER- 
SUPPLIED 


SYSTEM 
ROUTINES 


KERNEL 
SUPPLIED 
DEVICE 
MANAGERS 


| HARDWARE 


Figure 1. iRMKT 960 Real-Time Architecture 
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Response to interrupts is both fast and predictable. 
Most of the kernel’s system calls can be executed 
directly from interrupt handlers. 


TIME MANAGEMENT 


The time management features included in the ker- 
nel provide single-shot alarms, repetitive alarms and 
a real-time clock. In addition, alarms can be reset. 


These time management facilities can solve a wide 
range of real-time programming problems. Single- 
shot alarms, for example, can be used to handle 
timeouts. If the timeout occurs, the alarm invokes a 
user-written handler; if the event occurs before the 
timeout, the application simply deletes the alarm. 
Other uses for the kernel’s time management facili- 
ties including polling devices with repetitive alarms, 
putting tasks to sleep for specified periods of time, 
or implementing a time-of-day clock. 


INTERTASK SYNCHRONIZATION AND 
COMMUNICATION 


Semaphores, regions and mailboxes are the key 


mechanisms the kernel uses for synchronizing tasks 
and communicating between tasks. 


Semaphores are objects used for intertask signaling 
and synchronization. Tasks exchange abstract 
“units” with semaphores as a means of becoming 
synchronized. A task requests a unit from a sema- 
phore to gain access to a resource. If the resource is 
available, the semaphore will have a unit to give to 
the task, enabling the task to proceed. A task sends 
a unit to a semaphore to indicate that it has released 
a previously obtained resource. 


A special binary type of semaphore is called a Re- 
gion. Regions are used to ensure mutual exclusion, 
thus preventing deadlock when tasks contend for 
control of system resources. A task holding a re- 
gion’s unit runs at the priority of the highest priority 
task waiting in queue for the region’s unit. 


Mailboxes are queues that can hold any number of 
messages and are used to exchange data between 
tasks. Either data or pointers can be sent using mail- 
boxes. The kernel allows mailbox messages to be of 
any length. High priority messages can be placed 
(jammed) at the front of the message queue to en- 
sure that they are received and processed before 
other messages queued at the mailbox. 


To ensure that high priority tasks are not blocked by 
lower priority tasks, the kernel allows tasks to queue 
at semaphores and mailboxes in priority order. The 
kernel also supports first-in, first-out task queueing. 
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MEMORY POOL MANAGEMENT 


The iRMK 960 kernel uses the concept of memory 
pools to efficiently divide and manage blocks of 
memory. The memory pool manager provides for 
both fixed and variable block allocation. 


Memory can be divided into any number of pools. 
Multiple memory pools might be created for different 
speed memories, or for allocating different size 
blocks. The times to allocate and de-allocate fixed- 
size areas from within a pool have a fixed upper 
bound. 


The kernel-supplied memory manager works with 
flat memory architecture. Users can also write their 
own memory manager to provide different memory 
management policies or support virtual memory. 


Hardware Requirements and Support 


The kernel requires only an i960 microprocessor and 
sufficient memory for itself and its application. The 
kernel’s design, however, recognizes that many sys- 
tems use additional programmable peripheral devic- 
es and coprocessors. The kernel provides optional 
device managers for: 


° The 82380 Integrated System Peripheral (ISP) 
chip 
° The 8254 Programmable Interval Timer (PIT) chip 


An application can supply managers for other devic- 
es and coprocessors in addition to or in replacement 
of the devices listed above. 


The openness of the iRMK: 960 kernel is a major 
benefit to the OEM. The kernel is designed to be 
programmed into PROM or EPROM, making it easy 
to use in embedded designs. In addition, it can be 
used with any system bus, including those of MULTI- 
BUS | and MULTIBUS II bus architectures. 


A Modular Architecture for Easy 
Customization 


The kernel is designed for maximum flexibility. It can 
be customized for any application. Each major func- 
tion, mailboxes for example, is implemented as a 
separate module. The kernel’s modules have not 
been linked together and are supplied individually. 
(See Table 1 for the list of kernel modules, and their 
approximate sizes.) 


The user links only the modules needed for his appli- 
cation. Any module not used does not need to be 
linked in, and does not increase the size of the ker- 
nel in your application. The user can also replace 
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any optional kernel module. with one that imple- 
ments specific features required by the application. 
For example, the user might want to replace the ker- 
nel’s memory manager with one that supports virtual 
memory. 


Table 1. iRMKT™ 960 ae Modules 
and Approximate Sizes 


Core Modules Bytes 
Task Managers 2600 
Interrupt Manager .150 
Time Manager 3000 
Scheduler 1700 
Initialization 50 
Optional Modules 

Mailbox Manager 1250 
Semaphore Manager 2900 
Memory Manger 1260 
Fault Handler Manager 50 
Miscellaneous 300 
Optional Device Manager : 
82380 Integrated System Peripheral ~ 4200 
8254 Programmable Interval Timer _ 1200 


Total size of the (entire) kernel (minus device man- 
agers) is about 13.5 Kbytes. 


| Devélopina with the iIRMK™ 960 
Real-Time Kernel 


Kernel applications can be written using any lan- 
guage or compiler that produces code that executes 
on the i960 microprocessor. This independence is 
achieved by using an interface library. This library 
works with the idiosyncracies of a particular lan- 
guage—for example, the ordering of parameters. 
The interface library translates the calls provided by 
the language into a standard format expected by the 
kernel. Intel provides an interface library for our iC 
960 compiler. The source code of this library is in- 
cluded, so that the user can modify it to perpen oth- 
er compliers. 


Because the kernel is supplied as unlinked object 
modules, applications can be developed on any sys- 
tem that hosts the ceverepment: tools needed. 


Comprehensive Development Tool 
Support 


lntel provides a complete line of 80960 development 
tools for writing and ee IRMK 960 applica- 
tions. 


IRMKT™ 960 


These tools include: 


Software: ASM 960 assembler iC 960. 
compiler 
NOTE: 
These tools are available for DOS, 
VAX/VMS*, MicroVAX*, SUN* and 
EVA9S60KB 4MB environment 
Debuggers: | 
~  |ICET™ 960 _ In-Circuit Emulator for the i960 mi- 
croprocessor 
SMD™ 960 System Debug Monitor for the i960 
microprocessor 
Evaluation 
Vehicles: 
EVA960KB- AT Bus-Compatible Board 


A960KB4MB AT Bus-Compatible Board with 
4 Mbytes of Memory 


QT960 Standalone Evaluation Vehicle 


Intel Support, Consulting and Training 


With iRMK 960 kernel software, the developer has 
available the total Intel i960 architecture and real- 
time expertise of Intel’s support engineers. Intel pro- 
vides telephone support, on or off-site consulting, 
troubleshooting guides and updates. The kernel in- 
cludes 90 days of Intel’s Technical Information 
Phone Service (TIPS). Extended Suppo and con- 


' sulting are also available. 
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Contents of the iIRMKT™ 960 Kernel 
Development Package 


The iRMK 960 Kernel comes in a comprehensive 
package including: 
e Kernel object modules 


e Source for the kernel supplied 82380 Integrated 
System Peripheral and 8254 PIT device meng: 
ers 


e Source for the iC 960 interface library 


e Source for sample applications showing the fol- 
_ lowing: 


Structure of kernel applications 


Use of the kernel with an application written in iC 
960 language 


Compile, bind and build sequences 


Sample initialization code for the i960 microproc- 
essor 


Applications written to execute in a flat memory 
space 


e User reference guide 
e 90 days of customer support 


intel. iRMK™ 960 
LICENSING KN__get__time Get time 
KN__set__time Set time 


iRMK 960 software requires prior execution of the 
standard Intel Software License Agreement (SLA). A 
single development copy requires a Class | license 
and allows iRMK 960 software to be loaded and run 
on one single-processor system. 


SPECIFICATIONS 


System Calls 

The following items are system calls arranged by 
type: 

IRMK™ 960 KERNEL SYSTEM CALLS LISTING 


KERNEL INITIALIZATION | 


KN__initialize Initialize kernel 


OBJECT MANAGEMENT 


KN__token__to__ptr Returns a pointer to the 


area holding object 
Returns a pointer for the 
currenttask 


KN__current__task 


TASK MANAGEMENT 


Creates a task . 

Deletes a task 

Suspends a task 

Resumes a task © 

Change priority of a task 
~ Return priority of a task 


KN__create__task 
KN__delete__task 
KN__suspend__task 
KN__resume__task 
KN__set__priority 
KN__get__priority 


INTERRUPT MANAGEMENT 


KN__set__interrupt 
KN__stop__scheduling 
KN__start__scheduling - 


Specify interrupt handler 
Suspend task switching 
‘Resume task switching - 


TIME MANAGEMENT 


Put calling task to sleep 


KN__sleep | 
KN__create__alarm Create and start virtual 

| alarm clock 
KN_reset_alarm- Reset an existing alarm 
KN__delete _alarm Delete alarm 


_ _KN__create_pool | 
~ KN._delete__pool 


KN__tick . Notify kernel that clock. 


tick has occurred 
INTERTASK COMMUNICATION AND 
SYNCHRONIZATION 


KN__create__semaphore Create a semaphore 
KN__delete__semaphore Delete a semaphore 


KN__send__unit Add aunittoa 
semaphore 
KN__receive__unit Receive aunitfroma_ 
semaphore — 


KN__create__mailbox Create a mailbox 
KN__delete__mailbox Delete a mailbox 
KN__send__data Send data to a mailbox 
KN__send__priority_data Place (jam) priority 

| __ message at head of 
message queue 


Request a message 


KN__receive__data 
| | ae from a mailbox 


MEMORY MANAGEMENT 


Create a memory pool 
Delete a memory pool 
KN__create__area 

‘ 7 ee from a pool 


- Return a memory ss to 


KN__delete__area 
| a memory pool 


KN__get__pool__ attributes Get amemory pool’s © 


attributes 


PROGRAMMABLE INTERRUPT 
CONTROLLER MANAGEMENT _ 


KN_initialize__PICs 
KN__mask__slot 


Initialize PIC’s 


Mask out interrupts on a 
specified slot: 


Unmask interrupts on a 
specified slot | 


Signal the PIC that the - 
interrupt on a specified 
slot has been serviced 


Change interrupt masks _ 


Return the most | 
important active babes: 
slot ae 


Get address of specified 
interrupt handler 


KN__unmask__slot 


KN__send__EO! 


KN__new__masks 
KN__get__slot 


KN__get__interrupt 


Createamemoryarea fae} 
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PROGRAMMABLE INTERVAL CONTROLLER | 
MANAGEMENT | 
KN_initialize_PIT  _Initialize the PIT. 
KN__start__PIT Start PIT counting 
KN__get_PIT_interval | Return PIT interval 


PROCESSOR RECOGNIZED FAULT HANDLING : 


KN__get__fault_handler Get address of fault 
| | | handler currently _ 
associated with © 
. specified fault oe 
KN__set_fault_handler Establish address of © 
“fault handler for the 
specified fault type 


PROCESSOR INTERRUPT 
CONTROLLER SUPPORT 


KN__get_processor_ Returns value of the 


—priority 7 processor 
KN__set_processor_ Change the value of the 
__priority ae = _ processor priority 
PERFORMANCE 


The figures listed below were derived from a test 
suite running on a EVA-960 evaluation vehicle using 
an 80960KB running at 20 MHz. The EVA-960 has 
what is known as .2-1-1-1 wait state memory; what 
this means is that the first instruction of a four in- 
struction fetch takes two wait states, and each of the 
three successive instructions takes one wait state. 


The figures are’ the worst case values obtained from _ 


several sets of test runs. The code was generated. 


using the iC 960 DOS hosted compiler, Version 1.1. 


Action ae ae Time (in ps) 
Create Pool se 18 

Get Pool Attributes reer (2) ear 

..DeletePool sth Ve 8s 
CreateArea _ 35 
..,.DeleteArea 7 32. 
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Action | 
Create Semaphore 
Delete Semaphore 
FIFO Semaphore Send Unit 

~ FIFO Semaphore Receive Unit 
Region Semaphore Send Unit 
Region Semaphore Receive Unit 


Create Mailbox 
Delete Mailbox 
Send Data 
Receive Data 


x 


Create Alarm 

Delete Alarm 

FIFO Semaphore Send/Receive 
Unit with Task Switch | 


Suspend Task with Task Switch | 


Basic Task Switch 
Create Task 
Suspend Task 
Resume Task 
Delete Task 

Get Priority _ 

Set Priority 


Set Interrupt 
Get Interrupt’ 


MANUALS 


Time (in ps) 


6 
14 


iRMK 960 User's Manual (Intel Order #469863- 


001). 


TRAINING INFORMATION 


inte Customer Service ‘Training: 


“B0960 KA/KB Embedded. Processor Training 
Course” 


ORDERING saiaeanaie 


Ordering Code = .— Product Description. 
RMK960 _ IRMK 960 Real-Time Kernel 


x 
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Low Cost Processor Evaluation Tool 


Intel’s EV80960CA evaluation board provides a low-cost hardware environment for code 
execution and software debugging. The board features the 80960CA, the newest and 
highest performance member of Intel’s family of 32-bit embedded microprocessors. The 
board allows a user’s program to take full advantage of the power of the 80960CA and 
provides zero wait state execution of the user’s code. 


Popular features such as single line assembler/disassembler, single-step program 
execution and software breakpoints are standard on the EV80960CA’s on-board monitor. 
Available separately, Intel offers a complete code development environment using the 
assembler (ASM-960) as well as high-level languages, such as Intel’s iC-960 C compiler, to 
accelerate development schedules. 


The EV80960CA evaluation board package features the 80960CA System Debug Monitor 
(SDM) in EPROM, a SDM host software floppy disk, a power supply cable, a 9-pin PC/AT 
serial connector for terminal and the EV80960CA User’s Manual. The EV80960CA 
User’s Manual includes schematics of the board, a part list and programmable logic 
(PLD) equations. The board is hosted on an IBM or BIOS-compatible PC/AT. 


*The SRAM memory system provides zero wait state read (0-0-0-0-0) and one wait state write (1-1-1-1-0) performance. 
**The DRAM memory system provides 2-1-1-1-1 reads and writes. 
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EV80960CA Evaluation Board — 


EV80960CA Features | 


e 25 MHz Execution Speed 

e 32 Kbytes of EPROM for 80960CA SDM 
Target Operating Firmware : 

e 64 Kbytes of Zero Wait State Pipelined | 
SRAM* | 


¢ 1 Mbyte of Static-Column Mode DRAM* ce 


expandable to4 Mbytes — 

¢ Concurrent Interrogation of Memory and » 
Registers | 

¢ Software Breakpoints — 

¢ Code Disassembly 

e High-Level Language Siinsert 

e Two RS-232s for Host and User 
Communication 

¢ Two iSBX I/O Connectors . a 

e¢ An Expansion Bus to Accommodate 


Eurocard Form-Factor Prototyping Boards © 


Fast Pipelined SRAM Memory 
System 


The pipelined-read memory y system of the 


EV80960CA provides true zero wait state read - 


and one wait state performance. The memory 
design utilizes the internal wait state — 
generator of the 80960CA. : 


| f Bus 
sossoca | | MBYTES | | aurrerinc 
cpu | | AND 


CONTROL 


Fast Static-Column Mode DRAM 


The memory design of the EV80960CA uses 
the 80960CA burst mode bus and static-column 
DRAM mode. The DRAM control PLDs are 
functionally isolated into interconnected state 
machines. The PLDs can be changed to allow 
alternative DRAM memory implementations 
with different DRAM access modes (static- — 
column mode, nibble mode or fast-page mode). 


Concurrent Interrogation of 
Memory and Registers 


The 80960CA System Debug Monitor (SDM) for 
the EV80960CA allows the user to read and 
modify internal registers and external memory 
while the user’s program is running on the 
board. 


iSBX I/ O Connectors and 
Expansion Interface «| 


The EV80960CA evaluation board has two 
connectors to support both 8- and 16-bit - 
standard iSBX Expansion Modules. The board 
also provides an expansion bus to | | 

- accommodate Eurocard form-factor 

_ prototyping boards. | 


1/0 EXPANSION 
"WITH STANDARD. 
_ 1SBX BOARDS” 


BOOT SENSE iSBX 
EPROM SWITCHES ~ EXPANSION 


SIGNAL | — 
_ GEN. BUFFER | 


64 KBYTES see 
fT O-WAIT 32—BIT 
STATE EXPANSION 
SRAM BUS 


FAULT AND TIMER/ 
USER LEDS COUNTERS UARTS 


HOST USER 
INTERFACE PORT 
PORT 
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Block Diagram of the EV80960CA Board 
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intel. 
EV80960CA Evaluation Board | 


Communication Link Host System Requirements 
The EV80960CA board communicates with the The EV80960CA Evaluation Board is hosted on 
host through the RS-232 link using an Intel an IBM PC/AT or compatibles; a 386-based PC 
82510 UART provided on board. The board. is recommended. The host system must meet 
supports seven baud rates: 300, 1200, 2400, the following minimum requirements: 
4800, 9600, 19200 and 38400. e 512 Kbytes of Memory 

ke e One 1.2 Mbyte Floppy Disk Drive 
Power Requirements ¢ PC-DOS 3.2 or Later 


The EV80960CA Evaluation Board requires5V ° “Serial Port (COM1 or COM2) 
at 2000 mA and +12V at 25 mA. 
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1960T™™ SA/SB EVALUATION BOARD 


The EV80960SX board is a general purpose evaluation tool for the i960T SA/SB 
embedded processors. This evaluation board provides a high-performance DRAM 
subsystem, an interleaved EPROM subsystem, and a robust set of peripheral devices for 
benchmarking and debugging application code written for the i960 SA/SB embedded 
processors. 


The EV80960SX is a great starter kit for your 32-bit application. The EV80960SX, 
NINDY debug environment, along with assembler and C-compiler (not provided) provide 
a seamless environment for developing code and evaluating the i960 SA/SB processors. 
The NINDY monitor provides code download capabilities from a number of popular 
development systems, including DOS-based PC’s. Single step, breakpoints, register and 
memory display are among the full set of features provided by NINDY. 


The board is provided with the following The EPROM subsystem accommodates 


features: four, 32-pin or 28-pin 8-bit wide EPROMs 

e DRAM Subsystem operates at with up to 150 ns access times. 
1-0-0-0-0-0-0-0 wait states for read and °¢ Flash EPROM Subsystem reads and 
write cycles in the burst mode. The writes two 8-bit wide Flash EPROMS. 
DRAM subsystem runs at the maximum =~ _ ® 8259A Interrupt Controller provides 
processor frequency of 16 MHz, using expanded interrupt capabilities using 
100 ns fast page mode DRAMs. The the 1960 SA/SB’s interrupt controller 
DRAM subsystem can accommodate interface. 
from 512 Kbytes to 4 Mbytes, using 4or8 ° Parallel Port Input allows fast 
ZIP-packaged DRAMs. downloads of code or data to the 

e Interleaved EPROM Subsystem executes EV80960SX board. The parallel port 
burst program fetches with a 2-0-1-0-2-0- provides auto-busy and interrupt 
1-0 wait state performance. capabilities, and is a full implementation 


of the Centronics standard. 


ACE51®, ICE® and MCS® are registered trademarks of Intel Corporation. 
Ethernet® is a registered trademark of Bere Corporation. 
*CHMOS is a patented Intel process. 
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i960T SA/SB EVALUATION BOARD 


° Two serial ports provide queued and 
interrupt driven serial transfer at up to 
128000 baud. 

82C54 Timer/Counter provides a 32-bit 
counter and 16-bit counter, each with 
dedicated interrupts. 

Expansion/Prototype Bus (XBUS) allows 
expansion cards and prototype hardware — 
direct access to the i960 SA/SB’s bus and » 
control signals. Optionally, a configurable 
wait state scheme providesanoglue _ 
interface to most peripherals attached to the 
XBUS. 

LEDs and Switches are user programmable. 
One 10-segment bar LED, a 7-segment LED 
and an 8-position switch are under program 
control. 

Local Area Networking (LAN) is 
implemented using an 82596SX LAN 
coprocessor. 4 


° 


° 


ig960™ 
- §B-16 
Processor 


PONE FEI) Ib 


e2596Sx |] | 
LAN 
Controller Buffers 


RELA STINT 

| Buffers/ 
Expansion 
Connector 


a 
Ee el 
fea; SCONES 
. a" 


Interleaved 
EPROM 


MRK 


LEDs & 
DIP 
witches 


° Laser Printer Control provides interfaces to | 
TEC or Canon compatible laser engines. 

° Monitor and Self-test diagnostics are 
provided for the EV80960SX in the EPROMs 
installed in the board. 


The evaluation board comes complete with a 
design database included on diskette, the 
NINDY debug monitor on diskette and in 
EPROM, power and serial cables, schematics 
and user’s manual. . 


The EV80960SX is a public domain design. The 
hardware is fully documented and provides 
working examples of popular memory and 
peripheral interfaces to the 1960 SA/SB 
processor. The schematic and PLD database 
are provided with each board. The EV80960SX 
designs are easily duplicated and can be used 
directly as the building blocks for custom 
designs. Custom hardware can be prototyped 
using the expansion bus (XBUS) connector. 


82C54 
Timer/ Interrupt. 
Counter Controller 


r WELT Laser Printer if 
= | eel Engine 
| Interface i 


8259A 


SRR ARR See eT 


Dual | Host Port 

RS-232 | 

Serial : f User Port 
Centronics 


Parallel 
Port (INPUT) 


RN ARER SUSIE "| 
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EV80960SX Evaluation Board 
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FAX: (714) 838-4151 . 


Avnet Computer 
3170 Pullman Street 
Costa Mesa 92626 
Tel: (714) 641-4121 
FAX: (714) 641-4170 


Avnet Computer 
1361B West 190th Street 
Gardena 90248 

Tel: (800) 345-3870 

FAX: (213) 327-5389 


+Certified VAD 


Avnet Computer 

755 Sunrise Blvd., #150 
Roseville 95661 

Tel: (916) 781-2521 
FAX: (916) 781-3819 


Avnet Computer 

1175 Bordeaux Drive, #A 
Sunnyvale 94089 

Tel: (408) 743-3304 

FAX: (408) 743-3348 


Avnet Computer 
21150 Califa Street 
Woodland Hills 91376 
Tel: (808) 345-3870 
FAX: (818) 594-8333 


tHamilton/Avnet Electronics 
3170 Pullman Street 

Costa Mesa 92626 

Tel: (714) 641-4100 

FAX: (714) 754-6033 


tHamilton/Avnet Electronics 
1175 Bordeaux Drive, #A 
Sunnyvale 94089 

Tel: (408) 743-3300 

FAX: (408) 745-6679 


tHamilton/Avnet Electronics 
4545 Viewridge Avenue 
San Diego 92123 

Tel: (619) 571-1900 

FAX: (619) 571-8761 


tHamilton/Avnet Electronics 
21150 Califa St. 

Woodland Hills 91367 

Tel: (818) 594-0403 

FAX: (818) 594-8234 


tHamilton/Avnet Electronics 
1361B West 190th Street 
Gardena 90248 

Tel: (213) 516-8600 

FAX: (213) 217-6822 


tHamilton/Avnet Electronics . 
755 Sunrise Avenue, #150 
Roseville 95661 > 

Tel: (916) 925-2216 

FAX: (916) 925-3478 


Pioneer/Technologies Group, Inc. 
134 Rio Robles 
San Jose 95134 

Tel: (408) 954-9100 


’ FAX: 408-954-9113 


tWyle Distribution Group 
124 Maryland Street 

El Segundo 90245 

Tel: (213) 322-8100 
FAX: (213) 416-1151 


Wyle Distribution Group 
7431 Chapman Ave. 
Garden Grove 92641 
Tel: (714) 891-1717 
FAX: (714) 891-1621 


tWyle Distribution Group 
2951 Sunrise Blvd., Suite 175 
Rancho Cordova 95742 

Tel: (916) 638-5282 

FAX: (916) 638-1491 


tWyle Distribution Group 
9525 Chesapeake Drive 
San Diego 92123 

Tel: (619) 565-9171 

FAX: (619) 365-0512 


tWyle Distribution Group 
3000 Bowers Avenue 
Santa Clara 95051 ~ 

Tel: (408) 727-2500 
FAX: (408) 727-5896 


tWyle Distribution Group 
17872 Cowan Avenue 
Irvine 92714 

Tel: (714) 863-9953 
FAX: (714) 263-0473 


tWyle Distribution Group 
26010 Mureau Road, #150 
Calabasas 91302 

Tel: (818) 880-9000 

FAX: (818) 880-5510 


COLORADO 


Arrow Electronics, Inc. 
3254 C Frazer Street 
Aurora 80011 

Tel: (303) 373-5616 
FAX: (303) 373-5760 


tHamilton/Avnet Electronics 
9605 Maroon Circle, #200 
Englewood 80112 

Tel: (303) 799-7800 

FAX: (303) 799-7801 


tWyle Distribution Group 
451 E. 124th Avenue 
Thornton 80241 

Tel: (303) 457-9953 
FAX: (303) 457-4831 


CONNECTICUT 


tArrow Electronics, Inc. 
12 Beaumont Road 
Wallingford 06492 

Tel: (203) 265-7741 
FAX: (203) 265-7988 


Avnet Computer 

55 Federal Road, #103 
Danbury 06810 

Tel: (203) 797-2880 
FAX: (203) 791-9050 


tHamilton/Avnet Electronics 
55 Federal Road, #103 
Danbury 06810 

Tel: (203) 743-6077 

FAX: (203) 791-9050 


tPioneer/Standard Electronics 
112 Main Street 

Norwalk 06851 

Tel: (203) 853-1515 

FAX: (203) 838-9901 


FLORIDA 


tArrow Electronics, Inc. 
400 Fairway Drive, #102 
Deerfield Beach 33441 
Tel: (305) 429-8200 
FAX: (305) 428-3991 


tArrow Electronics, Inc. 
37 Skyline Drive, #3101 
Lake Mary 32746 
Tel: (407) 333-9300 
FAX: (407) 333-9320 


Avnet Computer| 
3343 W. Commercial Blvd. 
Bidg. C/D, Suite 107 

Ft. Lauderdale 33309 

Tel: (305) 979-9067 — 
FAX: (305) 730-0368 


Avnet Computer 
3247 Tech Drive North 
St. Petersburg 33716 
Tel: (813) 573-5524 
FAX: (813) 572-4324 


tHamilton/Avnet Electronics 
5371 N.W. 33rd Avenue 

Ft. Lauderdale 33309 

Tel: (305) 484-5016 

FAX: (305) 484-8369 


tHamilton/Avnet Electronics 
3247 Tech Drive North 

St. Petersburg 33716 

Tel: (813) 573-3930 

FAX: (813) 572-4329 


tHamilton/Avnet Electronics 
7079 University Boulevard 
Winter Park 32791 

Tel: (407) 657-3300 

FAX: (407) 678-1878 


tPioneer/Technologies Group, Inc. 


337 Northlake Bivd., Suite 1000 
Alta Monte Springs 32701 

Tel: (407) 834-9090 

FAX: (407) 834-0865 


NORTH AMERICAN DISTRIBUTORS 


Pioneer/Technologies Group, Inc. 
674 S. Military Trail 

Deerfield Beach 33442 

Tel: (305) 428-8877 

FAX: (305) 481-2950 


GEORGIA 


Arrow Commercial System Group 
3400 C. Corporate Way 

Duluth 30136 

Tel: (404) 623-8825 

FAX: (404) 623-8802 


tArrow Electronics, Inc. 

4250 E. Rivergreen Pkwy., #E — 
Duluth 30136 : 
Tel: (404) 497-1300 

FAX: (404) 476-1493 


Avnet Computer 

3425 Corporate Way, #G 
Duluth 30136 

Tel: (404) 623-5452 

FAX: (404) 476-0125 


Hamilton/Avnet Electronics 
3425 Corporate Way, #G 
Duluth 30136 

Tel: (404) 446-0611 

FAX: (404) 446-1011 


Pioneer/Technologies Group, Inc. 
4250 C. Rivergreen Parkway 
Duluth 30136 

Tel: (404) 623-1003 

FAX: (404) 623-0665 


ILLINOIS 


tArrow Electronics, Inc. 
1140 W. Thorndale Rd. 
Itasca 60143 

Tel: (708) 250-0500 


Avnet Computer 

1124 Thorndale Avenue 
Bensenville 60106 

Tel: (708) 860-8573 
FAX: (708) 773-7976 


tHamilton/Avnet Electronics 
1130 Thorndale Avenue 
Bensenville 60106 

Tel: (708) 860-7700. 

FAX: (708) 860-8530 


MTI Systems 

1140 W. Thorndale Avenue 
ftasca 60143 

Tel: (708) 250-8222 

FAX: (708) 250-8275 


tPioneer/Standard Electronics 
2171 Executive Dr., Suite 200 
Addison 60101 
Tel: (708) 495-9680 
FAX: (708) 495-9831 


INDIANA 


tArrow Electronics, Inc. 
7108 Lakeview Parkway West Dr. 
Indianapolis 46268 

Tel: (817) 299-2071 

FAX: (317) 299-2379 


Avnet Computer 
485 Gradle Drive 
Carmel 46032 

Tel: (317) 575-8029 
FAX: (317) 844-4964 


Hamilton/Avnet Electronics 
485 Gradle Drive 

Carmel 46032 

Tel: (317) 844-9333 

FAX: (317) 844-5921 


tPioneer/Standard Electronics 
9350 Priority Way West Dr. 
Indianapolis 46250 
Tel: (317) 573-0880 

FAX: (317) 573-0979 


In 


IOWA 


Hamilton/Avnet Electronics 
2335A Blairsferry Rd., N.E. 
Cedar Rapids 52402 

Tel: (319) 362-4757 

FAX: (319) 393-7050 


KANSAS 


Arrow Electronics, Inc. 

8208 Melrose Dr., Suite 210 
Lenexa 66214 

Tel: (913) 541-9542 

FAX: (913) 541-0328 


Avnet Computer 
15313 W. 95th Street 
Lenexa 61219 

Tel: (913) 541-7989 
FAX: (913) 541-7904 


tHamilton/Avnet Electronics 
15313 W. 95th . 
Overland Park 66215 

Tel: (913) 888-1055 

FAX: (913) 541-7951 


KENTUCKY 


Hamilton/Avnet Electronics 
805 A. Newtown Circle 
Lexington 40511 

Tel: (606) 259-1475 

FAX: (606) 252-3238 


MARYLAND 


Arrow Commercial Systems Group 
200 Perry Parkway - 

Gaithersburg 20877 

Tel: (301) 670-1600 

FAX: (301) 670-0188 


tArrow Electronics, Inc. 
8300 Guilford Road, #H 
Columbia 21046. 

Tel: (301) 995-6002 
FAX: (301) 995-6201 


Avnet Computer . 

7172 Columbia Gateway Dr., #G- 
Columbia 21045 

Tel: (301) 995-0020 

FAX: (301) 995-3515 


tHamilton/Avnet Electronics 
7172 Columbia Gateway Dr., #F 
Columbia 21045 

Tel: (301) 995-3554 

FAX: (301) 995-3515 


tNorth Atlantic Industries 
Systems Division 

7125 Riverwood Dr. 
Columbia 21046 
Tel: (301) 290-3999 


tPioneer/Technologies Group, Inc. 
15810 Gaither Road 

Gaithersburg 20877 

Tel: (301) 921-0660 

FAX: (301) 670-6746 


MASSACHUSETTS 


Arrow Electronics, Inc. 
25 Upton Dr. 
Wilmington 01887 

Tel: (508) 658-0900 
FAX: (508) 694-1754 |: 


Avnet Computer 

10 D Centennial Drive 
Peabody 01960 

Tel: (508) 532-9886 
FAX: (508) 532-9660 


tHamilton/Avnet Electronics 
10D Centennial Drive . 
Peabody 01960 

Tel: (508) 531-7430 

FAX: (508) 532-9802 


tPioneer/Standard Electronics 
44 Hartwell Avenue 

Lexington 02173 — - 

Tel: (617) 861-9200 

FAX: (617) 863-1547 


Wyle Distribution Group 
15 Third Avenue 
Burlington 01803 

Tel: (617) 272-7300 
FAX: (617) 272-6809 


tCertified VAD 


MICHIGAN 


tArrow Electronics, Inc. 
19880 Haggerty Road 
Livonia 48152 

Tel: (313) 665-4100 
FAX: (313) 462-2686. 


Avnet Computer 

2876 28th Street, S.W., #5 
Grandville 49418 

Tel: (616) 531-9607 

FAX: (616) 531-0059 


Avnet Computer 
41650 Garden Road 
Novi 48375 

Tel: (313) 347-1820 
FAX: (313) 347-4067 


Hamilton/Avnet Electronics 
2876 28th Street, S.W., #5 
Grandville 49418 

Tel: (616) 243-8805 

FAX: (616) 531-0059 


Hamilton/Avnet Electronics 
41650 Garden Brook Rd., #100 
Novi 48375 

Tel: (313) 347-4270 

FAX: (313) 347-4021 


tPioneer/Standard Electronics 
4505 Broadmoor S.E. 

Grand Rapids 49512 

Tel: (616) 698-1800 

FAX: (616) 698-1831 


tPioneer/Standard Electronics 
13485 Stamford 

Livonia 48150 

Tel: (313) 525-1800 

FAX: (313) 427-3720 


MINNESOTA 


tArrow Electronics, Inc: 
10120A West 76th Street 
Eden Prairie 55344 

Tel: (612) 829-5588 
FAX: (612) 942-7803 


Avnet Computer . : 
10000 West 76th Street 
Eden Prairie 55344. 
Tel: (612) 829-0025 
FAX: (612) 944-2781 


tHamilton/Avnet Electronics 
12400 Whitewater Drive 
Minnetonka 55343 

Tel: (612) 932-0600 

FAX: (612) 932-0613 - 


tPioneer/Standard Electronics | 


7625 Golden Triange Dr., #G 
Eden Prairie 55344 

Tel: (612) 944-3355 

FAX: (612) 944-3794 


MISSOURI 


tArrow Electronics, Inc. 
2380 Schuetz Road 

St. Louis 63141 

Tel: (314) 567-6888 
FAX: (314) 567-1164 


Avnet Computer 
739 Goddard Avenue 
Chesterfield 63005 
Tel: (314) 537-2725 
FAX: (314) 537-4248 


tHamilton/Avnet Electronics 
741 Goddard 

Chesterfield 63005 

Tel: (314) 537-1600 

FAX: (314) 537- 4248 


NEW HAMPSHIRE 


Avnet Computer 

2 Executive Park Drive 
Bedford 03102 —=Ci; 
Tel: (603) 624-6630 
FAX: (603) 624-2402 


NEW JERSEY 


tArrow Electronics, Inc. 
4 East Stow Road 

Unit 11 : 

Marlton 08053 

Tel: (609) 596-8000 
FAX: (609) 596-9632 


tArrow Electronics, Inc. 
6 Century Drive 
Parsipanny 07054 

Tel: (201) 538-0900 
FAX: (201) 538-4962 


Avnet Computer 

1-B Keystone Ave., Bldg. 36 
Cherry Hill 08003 

Tel: (609) 424-8961 

FAX: (609) 751-2502 ~ 


Avnet Computer 

10 Industrial Road 
Fairfield 07006 

Tel: (201) 882-2879 
FAX: (201) 808-9251 


tHamilton/Avnet Electronics 
1 Keystone Ave., mee 36 
Cherry Hill 08003 


Tel: (609) 424-0110 | 


FAX: (609) 751-2552 


tHamilton/Avnet Electronics 
10 Industrial 

Fairfield 07006 

Tel: (201) 575-3390 

FAX: (201) 575-5839 


tMTI Systems Sales 
6 Century Drive 
Parsippany 07054 
Tel: (201) 539-6496 
FAX: (201) 539-6430 


{Pioneer/Standard Electronics 
14-A Madison Rd. 
Fairfield 07006 

Tel: (201) 575-3510 

FAX: (201) 575-3454 


NEW MEXICO 


Alliance Electronics Inc. 
10510 Research Avenue 
Albuquerque 87123 
Tel: (505) 292-3360: 
FAX: (505) 275-6392 


Avnet Computer 
7801 Academy Road 
Bldg. 1, Suite 204 
Albuquerque 87109 
Tel: (605) 828-9725 
FAX: (505) 828-0360 


tHamilton/Avnet Electronics 
7801 Academy Rd.'N.E. 
Bidg. 1, Suite 204 
Albuquerque 87108 

Tel: (505) 765-1500 

FAX: (505) 243-1395 


NEW YORK 


tArrow Electronics, Inc. 

3375 Brighton Henrietta Townline Rd. 
Rochester 14623 

Tel: (716) 427-0300 

FAX: (716) 427-0735 


Arrow Electronics, Inc. 
20 Oser Avenue 
Hauppauge 11788 
Tel: (516) 231-1000 
FAX: (516) 231-1072 


Avnet Computer 

933 Motor Parkway 
Hauppauge 11788 
Tel: (616) 231-9040 
FAX: (516) 434-7426 - 


Avnet Computer 
2060 Townline 
Rochester 14623 
Tel: (716) 272-9306 
FAX: (716) 272-9685 


tHamilton/Avnet Electronics 
933 Motor Parkway 
Hauppauge 11788 
Tel: (516) 231-9800 


“FAX: (516) 434-7426 


tHamilton/Avnet Electronics 
2060 Townline Rd. 
Rochester 14623 

Tel: (716) 292-0730 

FAX: (716) 292-0810 


NORTH AMERICAN DISTRIBUTORS (Contd.) 


Hamilton/Avnet Electronics 
103 Twin Oaks Drive 
Syracuse 13120 

Tel: (315) 437-2641 

FAX: (315) 432-0740 


MT! Systems 

50 Horseblock Road 
Brookhaven 11719 
Tel: (516) 924-9400 
FAX: (516) 924-1103 


MTI Systems 

1 Penn Plaza 

250 W. 34th Street. 
New York 10119 


Tel: (212) 643-1280 


FAX: (212) 643-1288. 


Pioneer/Standard Electronics 
68 Corporate Drive 
Binghamton 13904 
Tel: (607) 722-9300 


FAX: (607) 722-9562 


tPioneer/Standard Electronics 

60 Crossway Park West 
Woodbury, Long Island 11797 » 
Tel: (516) 921-8700 

FAX: (516) 921-2143 


tPioneer/Standard Electronics 
840 Fairport Park 

Fairport 14450 

Tel: (716) 381-7070 

FAX: (716) 381-5955 | 


NORTH CAROLINA | 


tArrow Electronics, Inc. 
5240 Greensdairy Road 
Raleigh 27604 

Tel: (919) 876-3132 
FAX: (919) 878-9517 


Avnet Computer 

2725 Millbrook Rd., #123 
Raleigh 27604 
Tel: (919) 790-1735 ! 
FAX: (919) 872-4972 


Hamilton/Avnet Electronics 
5250-77 Center Dr. #350 
Charlotte 28217 

Tel: (704) 527-2485 

FAX: (704) 527-8058 


tHamilton/Avnet Electronics 
3510 Spring Forest Drive 
Raleigh 27604 

Tel: (919) 878-0819 


Pioneer/Technologies Group, Inc. 
9401 L-Southern Pine Blvd. - ° 
Charlotte 28210 

Tel: (704) 527-8188 

FAX: (704) 522-8564 © 


Pioneer Technologies Group, Inc. 
2810 Meridian Parkway, #148 © 
Durham 27713 

Tel: (919) 544-5400. 

FAX: (919) 544-5885 


OHIO 


Arrow Commercial System Group 
284 Cramer Creek Court 
Dublin 43017 

Tel: (614) 889-9347 

FAX: (614) 889-9680 


+Arrow Electronics, Inc. 
6573 Cochran Road, #E 
Solon 44139 
Tel: (216) 248-3990 
FAX: (216) 248-1106 


Arrow Electronics, Inc. 

8200 Washington Village Dr. 
Centerville 45458 

Tel: (513) 435-5563 

FAX: (513) 435-2049 


intel. 
NORTH AMERICAN DISTRIBUTORS (Conitd.) 


OHIO (Contd.) 


Avnet Computer 

7764 Washington Village Dr. 
Dayton 45459 

Tel: (513) 439-6756 

FAX: (513) 439-6719 


Avnet Computer 

30325 Bainbridge Rd., Bldg. A 
Solon 44139 

Tel: (216) 349-2505 

FAX: (216) 349-1894 


tHamilton/Avnet Electronics 
7760 Washington Village Dr. 
Dayton 45459 

Tel: (513) 439-6733 

FAX: (513) 439-6711 


tHamilton/Avnet Electronics . 
30325 Bainbridge 

Solon 44139 

Tel: (800) 543-2984 

FAX: (216) 349-1894 


Hamilton/Avnet Electronics 

2600 Corp Exchange Drive, #180 
Columbus 43231 

Tel: (614) 882-7004 

FAX: (614) 882-8650 


MTI Systems Sales 

23404 Commerce Park Road 
Beachwood 44122 . 

Tel: (216) 464-6688 

FAX: (216) 464-3564 


tPioneer/Standard Electronics 
4433 Interpoint Boulevard 
Dayton 45424 

Tel: (513) 236-9900 

FAX: (513) 236-8133 


tPioneer/Standard Electronics 
4800 E. 131st Street: 
Cleveland 44105 

Tel: (216) 587-3600 

FAX: (216) 663-1004 


OKLAHOMA 


Arrow Electronics, Inc. 

12111 East 51st Street, #101 
Tulsa 74146 

Tel: (918) 252-7537 

FAX: (918) 254-0917 


tHamilton/Avnet Electronics 
12121 E. 51st St., Suite 102A 
Tulsa 74146 

Tel: (918) 664-0444 

FAX: (918) 250-8763 


OREGON 


tAlmac Electronics Corp. 
1885 N.W. 169th Place 
Beaverton 97006 

Tel: (503) 629-8090 
FAX: 503-645-0611 


Avnet Computer 

9409 Southwest Nimbus Ave. 
Beaverton 97005 

Tel: (503) 627-0900 

FAX: (503) 526-6242 


tHamilton/Avnet Electronics 
9409 S.W. Nimbus Ave. 
Beaverton 97005 

Tel: (503) 627-0201 

FAX: (503) 641-4012 


Wyle 

9640 Sunshine Court 
Bldg. G, Suite 200 
Beaverton 97005: 
Tel: (503) 643-7900 
FAX: (503) 646-5466 


PENNSYLVANIA 


Avnet Computer 

213 Executive Drive, #320 
Mars 16046 

Tel: (412) 772-1888 

FAX: (412) 772-1890 


Hamilton/Avnet Electronics 
213 Executive, #320 

Mars 16045 

Tel: (412) 281-4152 

FAX: (412) 772-1890 


tCertified VAD 


Pioneer/Technologies Group, Inc. 
259 Kappa Drive 

Pittsburgh 15238 

Tel: (412) 782-2300 

FAX: (412) 963-8255 


tPioneer/Technologies Group, Inc. 
500 Enterprise Road 

Keith Valley Business Center 
Horsham 19044 

Tel: (215) 674-4000 

FAX: (215) 674-3107 


TENNESSEE 


Arrow Commercial System Group 
3635 Knight Road, #7 

Memphis 38118 

Tel: (901) 367-0540 

FAX: (901) 367-2081 | 


TEXAS 


Arrow Electronics, Inc. 
3220 Commander Drive 
Carrollton 75006. 
Tel: (214) 380-6464 
FAX: (214) 248-7208 


Avnet Computer 

4004 Beltline, Suite 200 
Dallas 75244 

Tel: (214) 308-8181 
FAX: (214) 308-8129 


Avnet Computer - 

1235 North Loop West, #525 
Houston 77008 

Tel: (713) 867-7500 

FAX: (713) 861-6851 


tHamilton/Avnet Electronics 
1826-F Kramer Lane 

Austin 78758 

Tel: (800) 772-5668 

FAX: (512) 832-4315 


tHamilton/Avnet Electronics 
4004 Beltline, #200 

Dallas 75244 

Tel: (214) 308-8111 

FAX: (214) 308-8109 


tHamilton/Avnet Electronics 
1235 N. Loop West, #521 
Houston 77008 

Tel: (713) 240-7733 

FAX: (713) 861-6541 


tPioneer/Standard Electronics 
1826-D Kramer Lane 

Austin 78758 

Tel: (512) 835-4000 

FAX: (512) 835-9829 


tPioneer/Standard Electronics 
13765 Beta Road 

Dallas 75244 

Tel: (214) 386-7300 

FAX: (214) 490-6419 


tPioneer/Standard Electronics 
10530 Rockley Road, #100 
Houston 77099 

Tel: (713) 495-4700 

FAX: (713) 495-5642 


tWyle Distribution Group 
1810 Greenville Avenue 
Richardson 75081 ° 

Tel: (214) 235-9953 
FAX: (214) 644-5064 


Wyle Distribution Group 

4030 West Braker Lane, #330 
Austin 78758 ~ 

Tel: (512) 345-8853 

FAX: (512) 345-9330 


Wyle Distribution Group 
11001 South Wilcrest, #100 
Houston 77099 

Tel: (713) 879-9953 

FAX: (713) 879-6540 


UTAH 


Arrow Electronics, Inc. 
1946 W. Parkway Blvd. 
Salt Lake City 84119 
Tel: (801) 973-6913 


Avnet Computer 

1100 E. 6600 South, #150 
Salt Lake City 84121 

Tel: (801) 266-1115 

FAX: (801) 266-0362 


Avnet Computer 
17761 Northeast 78th Place 
Redmond 98052 

Tel: (206) 867-0160 

FAX: (206) 867-0161 


tHamilton/Avnet Electronics 
1100 East 6600 South, #120 
Salt Lake City 84121 

Tel: (801) 972-2800 

FAX: (801) 263-0104 


tWyle Distribution Group 
1325 West 2200 South, #E 


West Valley 84119 


Tel: (801) 974-9953 - 
FAX: (801) 972-2524 


"WASHINGTON 


tAlmac Electronics Corp. 
14360 S.E. Eastgate Way 
Bellevue 98007 

Tel: (206) 643-9992 

FAX: (206) 643-9709 


tHamilton/Avnet Electronics 
17761'N.E. 78th Place, #C 
Redmond 98052 


’ Tel: (206) 241-8555 


FAX: (206) 241-5472 


Wyle Distribution Group 


15385 N.E. 90th Street 
Redmond 98052 

Tel: (206) 881-1150 
FAX: (206) 881-1567 


WISCONSIN 


“Arrow Electronics, Inc. 
- 200 N. Patrick Bivd., Ste. 100 


Brookfield 53005 
Tel: (414) 792-0150 
FAX: (414) 792-0156 


Avnet Computer 
20875 Crossroads Circle, #400 


. Waukesha 53186 
.. Vek: (414) 784-8205 


FAX: (414) 784-6006 


tHamilton/Avnet Electronics 


28875 Crossroads Circle, #400 
Waukesha 53186 
Tel: (414) 784-4510 


. FAX: (414) 784-9509 


Pioneer/Standard Electronics 
120 Bishops Way #163 
Brookfield 53005 

Tel: (414) 784-3480 


ALASKA - 


Avnet Computer 

1400 West Benson Blvd. 
Suite 400 

Anchorage 99503 : 

Tel: (907) 274-9899 


_ . FAX: (907) 277-2639 


CANADA 
ALBERTA 

Avnet Computer 

2816 21st Street Northeast 
Calgary T2E 622 


Tel: (403) 291-3284 
FAX: (403) 250-1591 


Zentronics 

6815 8th Street N.E., #100 
Calgary.T2E 7H 

Tel: (403) 295-8838 

FAX: (403) 295-8714 


BRITISH COLUMBIA 


tHamilton/Avnet Electronics 
8610 Commerce Court 
Burnaby VS5A 4N6 

Tel: (604) 420-4101 

FAX: (604) 420-5376 


Zentronics 

11400 Bridgeport Rd., #108 
Richmond V6X 1T2 
Tel: (604) 273-5575 


FAX: (604) 273-2413 


ONTARIO 


‘Arrow Electronics, Inc. 


36 Antares Dr., Unit 100° 
Nepean K2E 7W5 

Tel: (613) 226-6903 

FAX: (613) 723-2018 


tArrow Electronics, Inc. 
1093 Meyerside, Unit 2 
Mississauga L5T 1M4 
Tel: (416) 670-7769 
FAX: (416) 670-7781 


’ Avnet Computer 


Canada System Engineering 
Group 

3688 Nashua Dr., Unit 6 
Mississuaga L4V 1M5 

Tel: (416) 672-8638 

FAX: (416) 677-5091 


Avnet Computer 

6845 Rexwood Road 
Units 7-9 
Mississuaga L4V 1M4 
Tel: (416) 672-8638 
FAX: (416) 672-8650 


Avnet Computer 
190 Colonade Road 
Nepean K2E 7J5 


Tel: (613) 727-7529 


FAX: (613) 226-1184 


" tHamilton/Avnet Electronics : 
6845 Rexwood Rad., Units 3-5 | ‘ 


Mississauga L4T 1R2 
Tel: (416) 677-7432 
FAX: (416) 677-0940 


tHamilton/Avnet Electronics 
190 Colonade Road 
Nepean K2E 7J5 


-. Tel: (613) 226-1700 
FAX: (613) 226-1184 


tZentronics 
1355 Meyerside Drive 
Mississauga LST 1C9 


Tel: (416) 564-9600 


FAX: (416) 564-3127 ; 


tZentronics 


. * 155 Colonade Rd., South 
- Unit 17 

- ‘Nepean K2E 7K1 

-.Tel: (613) 226-8840 


FAX: (613) 226-6352 


' QUEBEC 


Arrow Electronics Inc. 


1100 St. Regis Blvd. 


Doral H9P 2T5 


. Tel: (514) 421-7411 


FAX: (514) 421-7430 


Arrow Electronics, Inc. 

500 Boul. St-Jean-Baptiste Ave. 
Quebec H2E 5R9 

Tel: (418) 871-7500 

FAX: (418) 871-6816 


Avnet Computer 
2795 Rue Halpern 


. St. Laurent H4S 1P8° 


Tel: (514) 335-2483 
FAX: (514) 335-2481 


tHamilton/Avnet Electronics 
2795 Halpern 

St. Laurent H4S 1P8 

Tel: (514) 335-1000 

FAX: (514) 335-2481 


tZentronics 

520 McCaffrey 

St. Laurent H4T 1N3 
Tel: (514) 737-9700 
FAX: (514) 737-5212 


) 


NLAND 


tal Finland OY 
juosilantie 2 
10390 Helsinki 
fel. (358) 0544 644 
FAX: (358) 0 544 030 


FRANCE 


Intel Corporation S.A.R.L. 
1, Rue Edison-BP 303 


78054 St. Quentin- -en-Yvelines 


Cedex . 
Tel: (33) (1) 30 57 70 00 
FAX: (33) (1):30 64 60 32 


AUSTRIA - 


Bacher Electronics GmbH 


Rotenmuehigasse 26 
A-1120 Wien 

Tel: 43 222 81356460 
FAX: 43 222 834276 . 


BELGIUM 


Inelco Belgium S.A. 
Oorlogskruisenlaan 94 
B-1120 Bruxelles 

Tel: 32 2 244 2811 

FAX: 32 2 216 4301 


FRANCE 


Almex , 
48, Rue de l'Aubepine 
B.P. 102 

92164 Antony Cedex 
Tel: 33 1 4096 5400 
FAX: 33 1 4666 6028 . 


Lex Electronics 

Silic 585 

60 Rue des Gemeaux 
.94663 Rungis Cedex 


' Tel: 33 1 4978 4978 


FAX: 33 1 4978 0596 


Metrologie 
- Tour d’Asnieres 
4, Avenue Laurent Cely 


92606 Asnieres Cedex 


Tel: 33 1 4790 6240 
FAX: 33 1 4790 5947 


- Tekelec-Airtronic 
. Cite Des Bruyeres 
Rue Carle Vernet 
BP 2 
92310 Sevres 
Tel: 33 1 4623 2425 
FAX: 33 1 4507 2191 


GERMANY 


E2000 Vertriebs-AG 

* Stahlgruberring 12 
8000 Muenchen 82 
Tel: 49 89 420010 
FAX: 49 89 42001209 


Jermyn GmbH 

im Dachsstueck 9 
6250 Limburg 

Tel: 49 6431 5080 
FAX: 49 6431 508289 


Metrologie GmbH 
Steinerstrasse 15 
8000 Muenchen 70 
Tel: 49 89 724470 
FAX: 49 89 72447111 


EUROPEAN SALES OFFICES 


GERMANY 


Intel GmbH 
Dornacher Strasse 1 


8016 Feldkirchen bei Muenchen 


Tel: (49) 089/90992-0 
FAX: (49) 089/9043948 


ISRAEL 
Intel Semiconductor Ltd. 


_ Atidim Industrial Park-Neve Sharet 


P.O. Box 43202 
Tel-Aviv 61430 

Tel: (972) 03 498080 
FAX: (972) 03 491870 


Proelectron Vertriebs GmbH 
Max-Planck-Strasse 1-3 
6072 Dreieich 

Tel: 49 6103 304343 

FAX: 49 6103 304425 


~ Rein Electronik GmbH 


Loetscher Weg 66 
4054 Nettetal 1 - 

Tel: 49 2153 7330 
FAX: 49 2153 733513 


GREECE 

Pouliadis Associates Corp. 
5 Koumbari Street 
Kolonaki Square - 

10674 Athens 


Tel: 30 1 360:3741 
FAX: 30 1 360 7501 


IRELAND 


Micro Marketing 


Tany Hall 


Eglinton Terrace 
Dundrum 

Dublin 

Tel: 0001 989 400 


FAX: 0001 989 8282 


| ISRAEL 


‘Eastronics Ltd. 
.. Rozanis 11 


P.O.B. 39300 


- Tel Baruch 
= Tel-Aviv 61392 


Tel: 972 3 475151 
FAX: 972 3.475125 


- ITALY 


Celdis Spa 

Via F.11i Gracchi 36 
20092 Cinisello Balsamo 
Milano 


"Tel: 39 2 66012003 
_ FAX: 39 2 6182433 


-Intesi Div. Della Deutsche 


Divisione ITT 
Industries GmbH 

P.1. 06550110156 
Milanofiori Palazzo E5 
20094 Assago (Milano) 


Tel: 39 2 824701 
FAX: 39 2 8242631 


‘ITALY 


Intel Corporation Italia S.p.A. 
Milanofiori Palazzo E 

20094 Assago 

Milano 

Tel: (39) (02) 89200950 


” FAX: (39) (2) 3498464 


NETHERLANDS 
intel Semiconductor B.V. 


. Postbus 84130 


3009 CC Rotterdam 
Tel: (81) 10 407 11 11 
FAX: (31) 10 455 4688 


Lasi Elettronica S.p.A. 
P.I. 00839000155 

Viale Fulvio Testi, N.280 
20126 Milano 

Tel: 39 2 66101370 
FAX: 39 2 66101385 


Telcom s.r.!.— Divisione MDS 
Via Trombetta 
Zona Marconi 


Strada Cassanese 


Segrate—Milano - 


‘Tel: 39 2 2138010 


FAX: 39 2 216061 


| NETHERLANDS 


| Koning en Hartman B.V. ~ 


Energieweg 1 
2627 AP Delft 


.Tel: 31.15 609 906 


FAX: 31 15 619 194 


PORTUGAL 
ATD Electronica LDA 


~ Rua Or. Faria de 


Vasconcelos, 3a 
1900 Lisboa 
Tel: 3511 8472200 


. FAX: 351 1 8472197 


SPAIN 
' ATD Electronica 


Plaza Ciudad de Viena, 6 
28040 Madrid 

Tel: 34 1 534 4000/09 
FAX: 34 1 534 7663 


Metrologia Iberica 

Ctra De Fuencarral N.80 
28100: Alcobendas 
Madrid 

Tel: 34 1 6538611 


_ FAX: 34:1 6517549 


SCANDINAVIA 


OY Fintronic AB 
Heikkilantie 2a 
SF-00210 Helsinki 
Tel: 358 0 6926022 


. .FAX: 358 0 6821251 


SPAIN 


Intel Iberia S.A. 
Zubaran, 28 

28010 Madrid 

Tel: (34) 308 25 52 
FAX: (34) 410 7570 


- SWEDEN 


Intel Sweden A.B. . 
Dalvagen 24 

171 36 Solna 

Tel: (46) 8 734 01 00 
FAX: (46) 8 278085 | 


ITT Multikomponent A/S 
Naverland 29 

DK-2600 Glostrup 
Denmark 

Tel: 010 45 42 451822 
FAX: 010 45 42 457624 


Nordisk Elektronik A/S 
Postboks 122 . ; 
Smedsvingen 4 
N-1364 Hvalstad 
Norway. . 

Tel: 47 2 846210 | 
FAX: 47:2 846545 


Nordisk Electronik AB 
Box 36 , 


:, Torshamnsgatan 39 . 
. §-16493 Kista 


Sweden - 
Tel: 46 8 7034630 
FAX: 46 8 7039845 


SWITZERLAND 
Industrade AG. , 
Hertistrasse 31. 
CH-8304 Wallisellen. 


Tel: 41 1 8328111 ~ 
FAX: 41 1 8307550 


TURKEY 


EMPA 

80050 Sishane —- 

Refik Saydam Cad No. 89/5 
istanbul 

Tel: 90 1 143 6212 

FAX: 90 1.143 6547 


UNITED KINGDOM 
Access Elect Comp Ltd. 


- Jubilee House 


Jubilee Road 
Letchworth, _ 
Hertfordshire. - 
SG6 1QH 

Tel: 0462 480888 _ 
FAX: 0462 682467 


Bytech Components Ltd. 
12a Cedarwood 
Chineham Business Park 
Crockford Lane . 
Basingstoke - 

Hants RG12 1RW -' 

Tel: 0256 707107 

FAX: 0256 707162 © 


UNITED KINGDOM 


Intel Corporation (U.K.) Ltd. 
Pipers Way 

Swindon, Wiltshire SN3 1RJ 
Tel: (44) (0793) 696000 
FAX: (44) (0793) 641440 


EUROPEAN DISTRIBUTORS/REPRESENTATIVES 


Bytech Systems 
Unit3 . 

The Western Centre 
Western Road 
Bracknell : 
Berks RG12 1RW | 
Tel: 0344 55333 
FAX: 0344 867270 © 


Metrologie 
Rapid House - 
Oxford Road . 
High Wycombe 
Bucks 


Herts HP11 2EE | 


Tel: 0494 474147 
FAX: 0494 452144 


Jermyn © - 

Vestry Estate 
Otford Road 
Sevenoaks 
Kent TN14 5EU 
Tel: 0732 450144 
FAX: 0732 451251 . 


MMD 

3 Bennet Court 
Bennet Road 
Reading 
Berkshire RG2 0QX 
Tel: 0734 313232 
FAX: 0734 313255 


Rapid Silicon 

3 Bennet Court 
Bennet Road 
Reading 

Berks RG2 0QX 
Tel: 0734 752266 
FAX: 0734 312728 


Metro Systems 
Rapid House 

Oxford Road 

High Wycombe 
Bucks HP11 2EE "| 
Tel: 0494 474171 
FAX: 0494 21860 


YUGOSLAVIA 


H.R. Microelectronics Corp. 
2005 de la Cruz Blvd. 
Suite 220 

Santa Clara, CA 95050 
U.S.A 


Tel: (408) 988-0286 
FAX: (408) 988-0306 


AUSTRALIA 


Intel Australia Pty. Ltd. 

Unit 13 

Allambie Grove Business Park 
25 Frenchs Forest Road East 
Frenchs Forest, NSW, 2086 
Sydney 

Tel: 61-2-975-3300 

FAX: 61-2-975-3375 


fntel Australia Pty. Ltd. 
711 High Street 

1st Floor 

East Kw. Vic., 3102 
Melbourne 

Tel: 61-3-810-2141 
FAX: 61-3-819 7200 


BRAZIL 


Intel Semicornductores do Brazil LTDA 
Avenida Paulista, 1159-CJS 404/405 
01311 - Sao Paulo - S.P. 

Tel: 55-11-287-5899 

TLX: 11-37-557-ISDB 

FAX: 55-11-287-5119 


CHINA/HONG KONG 


Intel PRC Corporation 
15/F, Office 1, Citic Bldg. 
Jian Guo Men Wai Street 
Beijing, PRC 

Tel: (1) 500-4850 

TLX: 22947 INTEL CN 
FAX: (1) 500-2953 


INTERNATIONAL DISTRIB 


ARGENTINA 


Dafsys S.R.L. 
Chacabuco, 90-6 Piso 
1069-Buenos Aires 
Tel: 54-1-34-7726 
FAX: 54-1-34-1871 


AUSTRALIA 


Email Electronics 

15-17 Hume Street 
Huntingdale, 3166 

Tel: 011-61-3-544-8244 
TLX: AA 30895 

FAX: 011-61-3-543-8179 


NSO-Australia 

205 Middleborough Rd. 
Box Hill, Victoria 3128 
Tel: 03 8900970 

FAX: 03 8990819 


BRAZIL 


Microlinear 

Largo do Arouche, 24 
01219 Sao Paulo, SP 
Tel: 5511-220-2215 
FAX: 5511-220-5750 


_ CHILE 


Sisteco 

Vecina! 40—Las Condes 
Santiago 

Tel: 562-234-1644 

FAX: 562-233-9895 


CHINA/HONG KONG 


Novel Precision Machinery Co., Ltd. 
Room 728 Trade Square 

681 Cheung Sha Wan Road 
Kowloon, Hong Kong 

Tel: (852) 360-8999 

TWX: 32032 NVTNL HX 

FAX: (852) 725-3695 


GUATEMALA 
Abinitio 

11 Calle 2—Zona9Q 
Guatemala City 


Tel: 5022-32-4104 
FAX: 5022-32-4123 


*Field Application Location 


INTERNATIONAL SALES OFFICES 


Intel Semiconductor Ltd.* 
10/F East Tower 

Bond Center 
Queensway, Central 
Hong Kong 

Tel: (852) 844-4555 

FAX: (852) 868-1989 


INDIA 


Intel Asia Electronics, Inc. 
4/2, Samrah Plaza 

St. Mark's Road 

Bangalore 560001 

Tel: 91-812-215773 

TLX: 953-845-2646 INTEL IN 
FAX: 091-812-215067 


JAPAN 


Intel Japan K.K. 

5-6 Tokodai, Tsukuba-shi 
Ibaraki, 300-26 

Tel: 0298-47-851 1 

FAX: 0298-47-8450 


Intel Japan K.K.* 
Hachioji ON Bldg. 
4-7-14 Myojin-machi 
Hachioji-shi, Tokyo 192 
Tel: 0426-48-8770 
FAX: 0426-48-8775 


INDIA. 


Micronic Devices 

Arun Complex 

No. 65 D.V.G. Road 

Basavanagudi 

Bangalore 560 004 

Tel: 011-91-812-600-631 
011-91-812-611-365 

TLX: 9538458332 MDBG 


Micronic Devices 

No. 516 5th Floor 
Swastik Chambers 

Sion, Trombay Road 
Chembur 

Bombay 400 071 

TLX: 9531 171447 MDEV 


Micronic Devices 

25/8, 1st Floor 

Bada Bazaar Marg 

Old Rajinder Nagar 

New Delhi 110 060 

Tel: 011-91-11-5723509 
011-91-11-589771 

TLX: 031-63253 MDND IN 


Micronic Devices 

6-3-348/12A Dwarakapuri Colony 
Hyderabad 500 482 

Tel: 011-91-842-226748 


S&S Corporation 
1587 Kooser Road 
San Jose, CA 95118 
Tel: (408) 978-6216 
TLX: 820281 

FAX: (408) 978-8635 


JAMAICA 


MC Systems 

10-12 Grenada Crescent 

Kingston 5 

Tel: (809) 929-2638 
(809) 926-0188 

FAX: (809) 926-0104 


JAPAN 


Asahi Electronics Co. Ltd. 
KMM Bldg. 2-14-1 Asano 
Kokurakita-ku 
Kitakyushu-shi 802 

Tel: 093-511-6471 

FAX: 093-551-7861 


Intel Japan K.K.* 

Bldg. Kumagaya 

2-69 Hon-cho 
Kumagaya-shi, Saitama 360 
Tel: 0485-24-6871 

FAX: 0485-24-7518 


Intel Japan K.K.* 
Kawa-asa Bldg. 

2-11-5 Shin-Yokohama 
Kohoku-ku, Yokohama-shi 
Kanagawa, 222 

Tel: 045-474-7661 

FAX: 045-471-4394 


Intel Japan K.K.* 
Ryokuchi-Eki Bldg. 

2-4-1 Terauchi 
Toyonaka-shi, Osaka 560 
Tel: 06-863-1091 

FAX: 06-863-1084 


Intel Japan K.K. 
Shinmaru Bldg. 

1-5-1 Marunouchi 
Chiyoda-ku, Tokyo 100 
Tel: 03-3201-3621 

FAX: 03-3201-6850 


Intel Japan K.K. 
Green Bldg. 

1-16-20 Nishiki 
Naka-ku, Nagoya-shi 
Aichi 460 

Tel: 052-204-1261 
FAX: 052-204-1285 


CTC Components Systems Co., Ltd. 
4-8-1 Dobashi, Miyamae-ku 
Kawasaki-shi, Kanagawa 213 


Tel: 044-852-5121 


FAX: 044-877-4268 


Dia Semicon Systems, Inc. 
Flower Hill Shinmachi Higashi-kan 
1-23. Shinmachi, Setagaya-ku 
Tokyo 154 

Tel: 03-3439-1600 

FAX: 03-3439-1601 


Okaya Koki 

2-4-18 Sakae 

Naka-ku, Nagoya-shi 460 
Tel: 052-204-8315 

FAX: 052-204-8380 


Ryoyo Electro Corp. 
Konwa Bldg. 
1-12-22 Tsukiji 
Chuo-ku, Tokyo 104 
Tel: 03-3546-5011 
FAX: 03-3546-5044 


KOREA 


J-Tek Corporation 

Dong Sung Bldg. 9/F 

158-24, Samsung-Dong, Kangnam-Ku 
Seou! 135-090 

Tel: (822) 557-8039 

FAX: (822) 557-8304 


Samsung Electronics 

Samsung Main Bldg. 

150 Taepyung-Ro-2KA, Chung-Ku 
Seoul 100-102 

C.P.O. Box 8780 

Tel: (822) 751-3680 

TWX: KORSST K 27970 

FAX: (822) 753-9065 


MEXICO 


PSI S.A. de C.V. 

Feo. Villa esq. Ajusco s/n 

Cuernavaca, MOR 62130 

Tel: 52-73-13-9412 , 
52-73-17-5340 

FAX: 52-73-17-5333 


NEW ZEALAND 


Email Electronics 

36 Olive Road 
Penrose, Auckland 
Tel: 011-64-9-591-155 
FAX: 011-64-9-592-681 


KOREA 


Intel Korea, Ltd. 

16th Floor, Life Bldg. 

61 Yoido-dong, Youngdeungpo-Ku 
Seoul 150-010 

Tel: (2) 784-8186 

FAX: (2) 784-8096 


SINGAPORE 


Intel Singapore Technology, Ltd. 
101 Thomson Road #08-03/06 
United Square 

Singapore 1130 

Tel: (65) 250-7811 

FAX: (65) 250-9256 


TAIWAN 


Intel Technology Far East Ltd. 
Taiwan Branch Office 

8th Floor, No. 205 

Bank Tower Bldg. 

Tung Hua N. Road 

Taipei 

Tel: 886-2-5144202 

FAX: 886-2-717-2455 


UTORS/REPRESENTATIVES 


SAUDI ARABIA 


AAE Systems, Inc. 
642 N. Pastoria Ave. 
Sunnyvale, CA 94086 
U.S.A , 


Tel: (408) 732-1710 
FAX: (408) 732-3095 
TLX: 494-3405 AAE SYS 


SINGAPORE 


Electronic Resources Pte, Ltd. 
17 Harvey Road 

#03-01 Singapore 1336 

Tel: (65) 283-0888 

TWX: RS 56541 ERS 

FAX: (65) 289-5327 


SOUTH AFRICA 


Electronic Building Elements 

178 Erasmus St. (off Watermeyet St.) 
Meyerspark, Pretoria, 0184 

Tel: 011-2712-803-7680 

FAX: 011-2712-803-8294 


TAIWAN 


Micro Electronics Corporation 
12th Floor, Section 3 

285 Nanking East Road 
Taipei, R.O.C. 

Tel: (886) 2-7198419 

FAX: (886) 2-7197916 


Acer Sertek Inc. 

15th Floor, Section 2 
Chien Kuo North Rd. 
Taipei 18479 R.O.C. 
Tel: 886-2-501-0055 
TWX: 23756 SERTEK 
FAX: (886) 2-5012521 


URUGUAY 


Interfase 

Zabala 1378 

11000 Montevideo 

Tel: 5982-96-0490 
5982-96-1143 

FAX: 5982-96-2965 


VENEZUELA 


Unixel C.A. 

4 Transversal de Monte Cristo 
Edf. AXXA, Piso 1, of. 1&2 
Centro Empresarial Boleita 
Caracas 

Tel: 582-238-6082 

FAX: 582-238-1816 


intel. 


Intel Corp. 

c/o TransAlaska Network 
1515 Lore Rd. 
Anchorage 99507 

Tel: (907) 522-1776 


intel Corp. 

c/o TransAlaska Data Systems 
c/o GCI Operations 

520 Fifth Ave., Suite 407 
Fairbanks 99701 

Tel: (907) 452-6264 


ARIZONA 


*Intel Corp. 

410 North 44th Street . 
Suite 500 — 
Phoenix 85008 

Tet: (602) 231-0386 

FAX: (602) 244-0446 


*Intel Corp. 

’ 500 E. Fry Blvd., Suite M-15 
Sierra Vista 85635 

Tel: (602) 459-5010 


ARKANSAS 


Intel Corp. 

c/o Federal Express 
1500 West Park Drive 
Little Rock 72204 


CALIFORNIA 


*Intel Corp. 

21515 Vanowen St., Ste. 116 
Canoga Park 91303 

Tel: (818) 704-8500 


*intel Corp. 

300 N. Continental Blvd. 
Suite 100 

El Segundo 90245 

Tel: (213) 640-6040 


*Intel Corp. 

' 1900 Prairie City Rd. 
Folsom 95630-9597 
Tel: (916) 351-6143 ~ 


*intel Corp. 

9665 Chesapeake Dr., Suite 325 
San Diego 92123 
Tel: (619) 292-8086 


**Intel Corp. 

400 N. Tustin Avenue 
Suite 450 

Santa Ana 92705 

Tel: (714) 835-9642 


**Intel Corp. 

2700 San Tomas Exp., 1st Floor 
Santa Clara 95051 

Tel: (408) 970-1747 


COLORADO 


*Intel Corp. 

600 S. Cherry St., Suite 700 
Denver 80222 

Tel: (303) 321-8086 


ARIZONA 


2402 W. Beardsley Road 

Phoenix 85027 

Tel: (602) 869-4288 
1-800-468-3548 


MINNESOTA 


3500 W. 80th Street 
Suite 360 

Bloomington 55431 
Tel: (612) 835-6722 


*Carry-in + locations 
**Carry-in/mail-in locations 


NORTH AMERICAN SERVICE OFFICES 


CONNECTICUT 


*Intel Corp. 

301 Lee Farm Corporate Park 
83 Wooster Heights Rd. 
Danbury 06811 


- Tel: (203) 748-3130 


FLORIDA 


**Intel Corp. 

800 Fairway Dr., Suite 160 
Deerfield Beach 33441 
Tel: (305) 421-0506 

FAX: (305) 421-2444 


*Intel Corp. 

5850 T.G. Lee Blvd., Ste. 340 
Orlando 32822 

Tel: (407) 240-8000 


GEORGIA 


*Intel Corp. 

20 Technology Park, Suite 150 
Norcross 30092 

Tel: (404) 449-0541 


5523 Theresa Street 
Columbus 31907 


HAWAII | 


**intel Corp. 
Honolulu 96820 
Tel: (808) 847-6738 
ILLINOIS 


**+tIntel Corp. 


_ Woodfield Corp. Center Ill 


300 N. Martingale Rd., Ste. 400 
Schaumburg 60173 
Tel: (708) 605-8031 


INDIANA 


*Intel Corp. 
8910 Purdue Rd., Ste. 350 
Indianapolis 46268 


_ Tel: (317) 875-0623 


KANSAS 


*Intel Corp. 

10985 Cody, Suite 140 
Overland Park 66210 
Tel: (913) 345-2727 


KENTUCKY 


Intel Corp. 

133 Walton Ave., Office 1A 
Lexington 40508 

Tel: (606) 255-2957 


Inte! Corp. 
896 Hillcrest Road, Apt. A 
Radcliff 40160 (Louisville) 


LOUISIANA 


Hammond 70401 
(serviced from Jackson, MS) 


MARYLAND 


**Intel Corp. 

10010 Junction Dr., Suite 200 
Annapolis Junction 20701 

Tel: (301) 206-2860 


MASSACHUSETTS 


**Intel Corp. 

Westford Corp. Center 
3 Carlisle Rd., 2nd Floor 
Westford 01886 

Tel: (508) 692-0960 


MICHIGAN 


*Intel Corp. 

7071 Orchard Lake Rd., Ste. 100 
West Bloomfield 48322 

Tel: (313) 851-8905 


MINNESOTA 


*Intel Corp. , 
3500 W. 80th St., Suite 360 
Bloomington 55431 

Tel: (612) 835-6722 


MISSISSIPPI 


. Intel Corp. 


c/o Compu-Care 

2001 Airport Road, Suite 205F 
Jackson 39208 

Tel: (601) 932-6275 


MISSOURI 


*Intel Corp. 


3300 Rider Trail South 
Suite 170 

Earth City 63045 

Tel: (314) 291-1990 


Intel Corp. 

Route 2, Box 221 
Smithville 64089 
Tel: (913) 345-2727 


NEW JERSEY 


**Intel Corp. 

300 Sylvan Avenue 
Englewood Cliffs 07632 
Tel: (201) 567-0821 


*Intel Corp. 

Lincroft Office Center 
125 Half Mile Road 
Red Bank 07701 

Tel: (908) 747-2233 


NEW MEXICO 


Intel Corp. 

Rio Rancho 1 

4100 Sara Road 

Rio Rancho 87124-1025 
(near Albuquerque) 

Tel: (505) 893-7000 


NEW YORK 


*Intel Corp. 

2950 Expressway Dr. South 
Suite 130 

Islandia 11722 

Tel: (516) 231-3300 


Intel Corp. 

300 Westage Business Center 
Suite 230 

Fishkill 12524 

Tel: (914) 897-3860 


Intel Corp. 

5858 East Molloy Road 
Syracuse 13211 

Tel: (315) 454-0576 


NORTH CAROLINA 


*Intel Corp. 

5800 Executive Center Drive 
Suite 105 

Charlotte 28212 

Tel: (704) 568-8966 


**Intel Corp. 

5540 Centerview Dr., Suite 215 
Raleigh 27606 

Tel: (919) 851-9537 


OHIO 
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Multimedia and 


Supercomputing Processors 

Intel Corporation’s Multimedia and 
Supercomputing Components Group products 
enrich computerized information and exchange 
technologies in imaginative new ways never 
before possible. To learn more about Intel’s 
problem-solving MSCG products: The i750° 
video processor, and the i860™ and i960™ 
microprocessor families, you will want to read 
this publication. 
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